New Study Tracks NLP Development and Direction Through a ‘World Scope’

Researchers “posit that the universes of knowledge and experience available to NLP models can be defined by successively larger world scopes: from a single corpus to a fully embodied and social context.”

by Synced

2020-04-28

Comments 12

The development of powerful pretrained language models and the tech’s virtually unlimited application and commercialization potential have made natural language processing (NLP) one of the hottest research areas in machine learning, with hardware and data breakthroughs driving performance leaps and leaderboard races across industry and academia.

This would seem an apt time to pause and reflect on the direction of NLP, and explore language in the broader AI, Cognitive Science, and Linguistics communities. In the new paper Experience Grounds Language, researchers “posit that the universes of knowledge and experience available to NLP models can be defined by successively larger world scopes: from a single corpus to a fully embodied and social context.” The distinguished group of researchers — including Turing Award winner Yoshua Bengio — hail from Carnegie Mellon University, University of Washington, MIT, MILA, University of Michigan, University of Edinburgh, DeepMind, University of Southern California, Semantic Machines, and MetaOptimize.

The limitations of corpora for covering language and experience have been recognized at various phases in the history of NLP research. In this study, researchers define the knowledge and experience available to NLP models using what they call a “World Scope” with five levels. The first, WS1, is the Corpus (our past), WS2 is the Internet (our present), WS3 is Perception, WS4 Embodiment, and WS5 Social.

A corpus has been the critical component in computer-aided, data-driven language research. The Penn Treebank for example is a large annotated corpus that includes over 4.5 million words of American English. The Penn Treebank was introduced in 1993 to study representations, as it was believed that text and spoken language understanding could be improved by automatically extracting information about language from very large corpora.

Fast forward to the first decade of the 21st century, new NLP tasks are introduced, and large web-crawls became viable. As the researchers note, “we are no longer constrained to a single author or source, and the temptation for NLP is to believe everything that needs knowing can be learned from the written world.” With NLP corpora expanded to include large web-crawls (WS2), deep models for learning transferable representations have advanced on a number of NLP benchmarks.

Unlike dictionaries, which define words in terms of other words, human beings can understand the essential meanings of many basic concepts. For example we directly learn what “heavy” and “soft” are by physically interacting with objects. While current state-of-the-art pretrained language models can generate coherent paragraphs of text, their word and sentence representations often fail to capture such grounded features of words. This is where WS3 (Perception) can help, as this level can include auditory input, tactile senses, and visual inputs.

The researchers propose the nuanced and contextual question “Is an orange more like a baseball or a banana?” to demonstrate World Scope levels, suggesting for example that WS3 can “begin to understand the relative deformability of these objects, but is likely to confuse how much force is necessary given that baseballs are used much more roughly than oranges in widely distributed media.”

Thanks to our rich representation of concepts derivable from perceptions, human beings can approach the question by simply acknowledging known facts — that an orange and a baseball share a similar shape, size and weight; that both oranges and bananas are edible, etc. The richness of our experience and knowledge might not be communicable by language, but it is essential to understanding language. This is what the WS4 (Embodiment) level aims at: “This intuitive knowledge could be acquired by embodied agents interacting with their environment, even before language words are grounded to meanings.”

Finally though, a learned agent needs to be tested. But how? The researchers stress the ultimate importance of coming to understand social context, which includes status, role, intent and countless other variables. The researchers believe AI agents will not be able to obtain social intelligence with only a fixed corpus to draw from. Instead, social interaction will provide a crucial signal for the agents, who will learn by participating.

The study describes an interesting roadmap that tracks the journey of NLP research so far, and points to the contextualization of language in human experiences as a target for future studies.

The paper Experience Grounds Language is on arXiv.

Journalist: Fangyu Cai | Editor: Michael Sarazen

12 comments on “New Study Tracks NLP Development and Direction Through a ‘World Scope’”

Milania

2021-11-21

The role of a software developer is important for any company specializing in software, but even more important for the DevOps team. In DevOps, the software developer is not only responsible for developing the code; they also test, deploy, monitor and maintain this code, check this page to learn more

Loading...

Reply
paulyoon1

2022-05-03

Executive summary
The purpose of the executive summary is to sell. It should be attention-getting, and highlight impressive company attributes. Your introduction should be crafted in such a way that potential clients will want to read the entire document, and not just the executive summary.Visit https://kingessays.com/article-critique.php for more details. If your company has received awards and accolades, highlight them in the executive summary. This can also serve as the foundation for a conversation on the potential partnership between your company and the client.

Loading...

Reply
Brenda Gray

2023-07-27

I’m particularly interested in the potential applications of these developments in various sectors, including the integration of NLP into mobile app development. For instance, react native services could benefit from enhanced NLP capabilities to improve user interactions, understand user preferences, and deliver more personalized experiences. The combination of powerful language models with React Native services can lead to exciting and innovative solutions for mobile applications.

Loading...

Reply
Garry

2023-09-22

For me, it is necessary to deal only with people who know how to do their job. If you are interested, you can read about rfid applications. I was interested in this article as it can help in the development of my business.

Loading...

Reply
Brandon Stark

2023-10-23

Nursing students often face the daunting task of writing assignments alongside their rigorous coursework and clinical training. These assignments are essential for reinforcing theoretical knowledge and developing critical thinking and communication skills. While they can be challenging, writing assignments https://nursingwriting.org/nursing-essay-writing/ are a crucial part of a nursing student’s education, preparing them for the demands of their future profession. They offer an opportunity to delve into healthcare topics, conduct research, and articulate their thoughts and ideas effectively. Though it may be a demanding aspect of their education, it’s an invaluable one that contributes to their growth as future healthcare professionals.

Loading...

Reply
Andy Kayfman

2024-03-27

It’s not just about strategy; it’s like a mental workout. I love the feeling of anticipation before each move, trying to predict my opponent’s next step like with https://www.nursingpaper.com/examples/implementation-of-a-new-electronic-health-record-system-essay/ It’s a game that keeps your mind sharp and always offers new challenges. Whether you’re a beginner or a grandmaster, chess has something to offer for everyone

Loading...

Reply
Pingback: Theme 10
John Smith

2024-05-23

Moreover, Saleem encourages a mindset of lifelong learning and critical thinking. By advising students to continually seek out and evaluate new resources: royalwriter top essay writing services , he promotes adaptability and a commitment to self-improvement. These skills are essential in today’s fast-paced world, where ongoing learning and flexibility are crucial for success.

Loading...

Reply
bensonclark

2024-07-02

Researchers are taking a new approach to tracking advancements in Natural Language Processing (NLP) with the concept of “World Scope.” This framework defines the knowledge and experience available to NLP models by assigning them increasing levels of scope. The most basic level (WS1) represents a single corpus, like a collection of text documents. As we move up the scale, models can access information from the entire internet (WS2), incorporate real-world perception (WS3), and even achieve embodiment and social interaction (WS4 & WS5). This “World Scope” eddie murphy red suit provides a valuable lens to assess NLP progress and identify areas where future research can push the boundaries of language models, allowing them to interact with and understand the world in increasingly complex ways.

Loading...

Reply
Alex James

2024-08-06

Its great post about NLP! I am exciting to see how advancements in machine learning and data processing are driving innovation in NLP. Although I would like to share this Race for Glory 2024 Gianmaria Martini White Racing Jacket I recently purchased. Its sleek design is perfect for motorsports fans.

Loading...

Reply
James Will

2024-09-12

Amazing! AI is shaping the future across different regions. Just like that fashion trend is changing rapidly, so I would like to share this Fear Of God Hoodie which looks cool and comfortable.

Loading...

Reply
Wells Done

2025-09-24

I was staring at the log files in Chicago, each line more confusing than the last, tracking latency spikes of over 6 seconds during peak hours. The panic was real—users were leaving! Browsing through the site, I found a detailed breakdown of load balancing strategies with real metrics from 2025. Applying just a few tweaks cut latency to under 1.5 seconds, and the system stabilized. I dropped n-ix.com in the middle so it would be instantly accessible for future crises. So if you have similar bottlenecks, I recommend this.

Loading...

Reply