Beyond Next-Token Prediction? Meta’s Novel Architectures Spark Debate on the Future of Large Language Models

Meta AI's recent research introduces the BLT architecture, eliminating tokenizers for improved multimodal processing, and the Large Concept Model (LCM), which operates on semantic "concepts" instead of tokens for more human-like reasoning and better cross-lingual generalization. These innovations challenge the traditional "next-token prediction" paradigm in LLMs.

by Synced

2025-01-25

Comments 20

A pair of groundbreaking research initiatives from Meta AI in late 2024 is challenging the fundamental “next-token prediction” paradigm that underpins most of today’s large language models (LLMs). The introduction of the BLT (Byte-Level Transformer) architecture, which eliminates the need for tokenizers and demonstrates significant potential in multimodal alignment and fusion, coincided with the unveiling of the Large Concept Model (LCM). The LCM takes a radical step further by also discarding tokens, aiming to bridge the gap between symbolic and connectionist AI by enabling direct reasoning and generation in a semantic “concept” space. These developments have ignited discussions within the AI community, with many suggesting they could represent a new era for LLM design.

The research from Meta explores the latent space of models, seeking to revolutionize their internal representations and facilitate reasoning processes more aligned with human cognition. This exploration stems from the observation that current LLMs, both open and closed source, lack an explicit hierarchical structure for processing and generating information at an abstract level, independent of specific languages or modalities.

The prevailing “next-token prediction” approach in traditional LLMs gained traction largely due to its relative ease of engineering implementation and its demonstrated effectiveness in practice. This method addresses the necessity for computers to process discrete numerical representations of text, with tokens serving as the simplest and most direct way to achieve this conversion into vectors for mathematical operations. Ilya Sutskever, in a conversation with Jensen Huang, previously suggested that predicting the next word allows models to grasp the underlying real-world processes and emotions, leading to the formation of a “world model.”

However, critics argue that using a discrete symbolic system to capture the continuous and complex nature of human thought is inherently flawed, as humans do not think in tokens. Human problem-solving and long-form content creation often involve a hierarchical approach, starting with a high-level plan of the overall structure before gradually adding details. For instance, when preparing a speech, individuals typically outline core arguments and the flow, rather than pre-selecting every word. Similarly, writing a paper involves creating a framework with chapters that are then progressively elaborated upon. Humans can also recognize and remember the relationships between different parts of a lengthy document at an abstract level.

Meta’s LCM directly addresses this by enabling models to learn and reason at an abstract conceptual level. Instead of tokens, both the input and output of the LCM are “concepts.” This approach has demonstrated superior zero-shot cross-lingual generalization capabilities compared to other LLMs of similar size, generating significant excitement within the industry.

Yuchen Jin, CTO of Hyperbolic, commented on social media that he is increasingly convinced tokenization will disappear, with LCM replacing “next-token prediction” with “next-concept prediction.” He intuitively believes LCM may excel in reasoning and multimodal tasks. The LCM has also sparked considerable discussion among Reddit users, who view it as a potential new paradigm for AI cognition and eagerly anticipate the synergistic effects of combining LCM with Meta’s other initiatives like BLT, JEPA, and Coconut.

How Does LCM Learn Abstract Reasoning Without Predicting the Next Token?

The core idea behind LCM is to perform language modeling at a higher level of abstraction, adopting a “concept-centric” paradigm. LCM operates with two defined levels of abstraction: subword tokens and concepts. A “concept” is defined as a language and modality-agnostic abstract entity representing a higher-level idea or action, typically corresponding to a sentence in a text document or an equivalent spoken utterance. In essence, LCM learns “concepts” directly, using a transformer to convert sentences into sequences of concept vectors instead of token sequences for training.

To train on these higher-level abstract representations, LCM utilizes SONAR, a previously developed Meta model for multilingual and multimodal sentence embeddings, as a translation tool. SONAR converts tokens into concept vectors (and vice versa), allowing LCM’s input and output to be concept vectors, enabling direct learning of higher-level semantic relationships. While SONAR acts as a bridge between tokens and concepts (and is not involved in training), the researchers explored three model architectures capable of processing these “concept” units: Base-LCM, Diffusion-based LCM, and Quantized LCM.

Base-LCM, the foundational architecture, employs a standard decoder-only Transformer model to predict the next concept (sentence embedding) in the embedding space. Its objective is to directly minimize the Mean Squared Error (MSE) loss to regress the target sentence embedding. SONAR serves as both a PreNet and PostNet to normalize input and output embeddings. The Base-LCM workflow involves segmenting input into sentences, encoding each sentence into a concept sequence (sentence vector) using SONAR, processing this sequence with LCM to generate a new concept sequence, and finally decoding the generated concepts back into a subword token sequence using SONAR. While structurally clear and relatively stable to train, this approach risks information loss as all semantic information must pass through the intermediate concept vectors.

Quantized LCM addresses continuous data generation by discretizing it. This architecture uses Residual Vector Quantization (RVQ) to quantize the concept layer provided by SONAR and then models the discrete units. By using discrete representations, Quantized LCM can reduce computational complexity and offers advantages in processing long sequences. However, mapping continuous embeddings to discrete codebook units can potentially lead to information loss or distortion, impacting accuracy.

Diffusion-based LCM, inspired by diffusion models, is modeled as an autoregressive model that generates concepts sequentially within a document. In this approach, a diffusion model is used to generate sentence embeddings. Two main variations were explored:

One-Tower Diffusion LCM: This model uses a single Transformer backbone tasked with predicting clean sentence embeddings given noisy inputs. It trains effectively by alternating between clean and noisy embeddings.
Two-Tower Diffusion LCM: This separates the encoding of the context from the diffusion of the next embedding. The first model (contextualizer) causally encodes context vectors, while the second model (denoiser) predicts clean sentence embeddings through iterative denoising.

Among the explored variations, the Two-Tower Diffusion LCM’s separated structure allows for more efficient handling of long contexts and leverages cross-attention during denoising to utilize contextual information, demonstrating superior performance in abstract summarization and long-context reasoning tasks.

What Future Possibilities Does LCM Unlock?

Meta’s Chief AI Scientist and FAIR Director, Yann LeCun, described LCM in a December interview as the blueprint for the next generation of AI systems. LeCun envisions a future where goal-driven AI systems possess emotions and world models, with LCM being a crucial component in realizing this vision.

LCM’s mechanism of encoding entire sentences or paragraphs into high-dimensional vectors and directly learning and outputting concepts enables AI models to think and reason at a higher level of abstraction, similar to humans, thereby unlocking more complex tasks.

Alongside LCM, Meta also released BLT and Coconut, both representing explorations into the latent space. BLT eliminates the need for tokenizers by processing bytes into dynamically sized patches, allowing different modalities to be represented as bytes and making language model understanding more flexible. Coconut (Chain of Continuous Thought) modifies the latent space representation to enable models to reason in a continuous latent space.

Meta’s series of innovations in latent space has sparked a significant debate within the AI community regarding the potential synergies between LCM, BLT, Coconut, and Meta’s previously introduced JEPA (Joint Embedding Predictive Architecture).

An analysis on Substack suggests that the BLT architecture could serve as a scalable encoder and decoder within the LCM framework. Yuchen Jin echoed this sentiment, noting that while LCM’s current implementation relies on SONAR, which still uses token-level processing to develop the sentence embedding space, he is eager to see the outcome of a LCM+BLT combination. Reddit users have speculated about future robots conceptualizing daily tasks through LCM, reasoning about tasks with Coconut, and adapting to real-world changes via JEPA.

These developments from Meta signal a potential paradigm shift in how large language models are designed and trained, moving beyond the established “next-token prediction” approach towards more abstract and human-like reasoning capabilities. The AI community will be closely watching the further development and integration of these novel architectures.

The paper Large Concept Models: Language Modeling in a Sentence Representation Space is on arXiv.

20 comments on “Beyond Next-Token Prediction? Meta’s Novel Architectures Spark Debate on the Future of Large Language Models”

Fifos Lilio

2025-04-16

I recently got into tactical gear research and came across this site. Honestly, their selection of armor plates surprised me — solid options for professionals and personal protection. I ended up choosing their ballistic plates for vest and the quality is top-tier. Definitely worth a look if you’re into serious protection gear.

Loading...

Reply
kamir bouchareb st

2025-04-19

thank you

Loading...

Reply
- duhaua7
  
  2025-12-10
  
  Meta’s exploration of LCM, BLT, and Coconut indicates a strong shift towards unified multimodal reasoning, which could significantly impact future AI architectures like JEPA.
  slope
  
  Loading...
  
  Reply
- Ahuaua8
  
  2025-12-10
  
  Oh Meta’s exploration of LCM, BLT, and Coconut indicates a strong shift towards unified multimodal reasoning, which could significantly impact future AI architectures like JEPA.slope
  
  Loading...
  
  Reply
Retro Bowl College

2025-05-17

How does Meta’s BLT architecture enhance multimodal processing by eliminating tokenizers, and what advantages does this offer over traditional token-based models?

Loading...

Reply
Geometry Dash

2025-06-02

Meta’s new Large Concept Model is a promising shift from token-based prediction to concept-based reasoning, closer to how humans think. Removing tokenizers with the Byte-Level Transformer could improve multimodal understanding and shape the future of language models.

Loading...

Reply
Pingback: Beyond Next-Token Prediction? Meta’s Novel Architectures Spark Debate on the Future of Large Language Models - Daily AI Feed
Geometry Dash

2025-07-08

Jump into the musical madness of Geometry Dash. Every beat brings a new challenge, and every level is a test of your rhythm and reaction.

Loading...

Reply
yewiye

2025-07-09

Just like barbershops have evolved from simple haircuts to spaces of creativity and personal style, AI is evolving beyond traditional methods to become more intuitive and human-like. Both show how innovation and craftsmanship can transform an experience—whether it’s grooming or language understanding. It’s exciting to see how blending tradition with new ideas leads to better connections and results.
https://barbarossanyc.com/

Loading...

Reply
James Henry

2025-08-01

I was actually reading your article Bible Chat Ai and found some really interesting information. The thing is quite clear that I just want to thank for it.

Loading...

Reply
Alex

2025-09-04

Challenge yourself with players around the world to become the longest snake in Slither io. What do you think about it?

Loading...

Reply
xelafa

2025-09-11

JACANA Life represents a harmonious blend of nature, culture, and wellness. Grown organically in the lush mountains of Jamaica, each product reflects generations of knowledge and respect for the land. More than just cannabis, JACANA Life offers a lifestyle that promotes healing, balance, and connection to heritage. It’s an invitation to embrace natural living and experience the authentic spirit of Jamaica in every moment.
https://jacana.life/

Loading...

Reply
mikle

2025-09-24

In late 2024, Meta AI introduced two groundbreaking research initiatives that challenge the traditional “next-token prediction” paradigm of large language models (LLMs). The Byte-Level Transformer (BLT) removes the need for tokenizers, showing strong potential in multimodal alignment and fusion. Meanwhile, the Large Concept Model (LCM) goes further by eliminating tokens entirely, enabling reasoning and generation in a semantic “concept” space and bridging symbolic and connectionist AI approaches. MyCCPay

Loading...

Reply
chery

2025-10-29

Fascinating developments — Meta’s exploration of latent space through LCM, BLT, and Coconut shows a clear push toward more unified and flexible multimodal reasoning. It’ll be interesting to see how these approaches integrate with JEPA and influence the next generation of AI architectures. Slope Rider

Loading...

Reply
chery

2025-10-29

thanks this information Slope Rider

Loading...

Reply
Elizabeth Sherman

2025-12-02

This blog information is very amazing EZPassIL login

Loading...

Reply
slope

2025-12-10

Wow Meta’s exploration of LCM, BLT, and Coconut indicates a strong shift towards unified multimodal reasoning, which could significantly impact future AI architectures like JEPA.

Loading...

Reply
Pingback: The “Stochastic Parrot” Narrative Is Dead. Physics Killed It. – Kamal Reader
dhyahha

2026-02-03

Dive into this thrilling Roblox experience today and embark on an adventure filled with strategy, competition, and endless fun! Whether you’re a seasoned player or new to Roblox, there’s something for everyone in steal a brainrot.

Loading...

Reply
concoracredit

2026-02-05

a high-level plan of the overall structure before gradually adding details. concoracredit

Loading...

Reply

Beyond Next-Token Prediction? Meta’s Novel Architectures Spark Debate on the Future of Large Language Models

Like this:

20 comments on “Beyond Next-Token Prediction? Meta’s Novel Architectures Spark Debate on the Future of Large Language Models”

Leave a Reply Cancel reply

Related

Share this:

Like this:

20 comments on “Beyond Next-Token Prediction? Meta’s Novel Architectures Spark Debate on the Future of Large Language Models”

Leave a Reply Cancel reply

Related