Powerful large language models (LLMs) now play essential roles in many real-world applications. But as humans become increasingly dependent on LLMs, some are questioning whether or to what extent we can trust them to deliver the “truth.”
In the new paper Discovering Latent Knowledge in Language Models Without Supervision, a research team from UC Berkeley and Peking University presents Contrast-Consistent Search (CCS), an unsupervised approach for discovering latent knowledge in language models.

The research team argues there are a number of ways conventional LLMs can become “misaligned with the truth.” If a model is trained via imitation learning, it may simply adopt the human demonstrators’ inefficiencies and errors. If a model’s outputs are rated by humans (reward optimization), the texts may be coherent and compelling, but errors that humans can’t detect may get through.
To circumvent these issues, instead of using explicit truth, the team focuses on models’ learned implicit, internal “beliefs” or “knowledge”, which it realizes through the introduction of Contrast-Consistent Search (CCS), a novel approach designed to accurately detect and reveal knowledge from model representations.

The proposed CCS aims at finding a direction in the activation space that is consistent across negations — i.e., that satisfies logical consistency properties such as that a statement and its negation have opposite truth values. The CCS workflow comprises four steps: 1) Answer questions with ‘yes’ or ‘no’, 2) Compute the representation of each answer, 3) Map the answer representations to probabilities of being true, and 4) Optimize that mapping to make the probabilities both consistent and confident.

In their empirical study, the team evaluated CCS across six models and ten question-answering datasets. The results show that CCS surpasses strong zero-shot baselines by an average of four percent and can cut prompt sensitivity in half while maintaining high accuracy — even if the prompted answers are incorrect.
This work demonstrates the potential of using unsupervised approaches to solve false text output issues in LLMs. The team sees their method as an initial step toward discovering latent knowledge when explicit ground truth labels are unavailable.
The paper Discovering Latent Knowledge in Language Models Without Supervision is on arXiv.
Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

شكرا
This research shows how unsupervised methods can be used to address LLMs’ problems with producing erroneous text output.
The research presented offers a promising direction for addressing these challenges.
Blockblast is a fun puzzle game where you clear blocks by matching colors. Tap groups of blocks to remove them and score points. It’s easy and relaxing.
This “Contrast-Consistent Search” is fascinating! Addressing LLM truthfulness is crucial. The misalignment issue highlights the “demonstrator error” problem well. Perhaps a Geometry Dash-like scoring system could add another layer of validation, rewarding consistent, accurate responses. This approach could reduce reliance on solely human-rated rewards and improve overall reliability. Interesting work!
This “Contrast-Consistent Search” is fascinating! Addressing LLM truthfulness is crucial. The misalignment issue highlights the “demonstrator error” problem well. Perhaps a Geometry Dash-like scoring system could add another layer of validation, rewarding consistent, accurate responses. This approach could reduce reliance on solely human-rated rewards and improve overall reliability. Interesting work!
ok thanks
Discovering implicit knowledge in unsupervised language models is a new challenge with geometry dash jump
If you like to play games online, you may go to our website. We are currently adding new and popular level devil games for free.