Toward AGI: Microsoft’s KOSMOS-1 MLLM Can Perceive General Modalities, Follow Instructions, and Perform In-Context Learning

In the new paper Language Is Not All You Need: Aligning Perception with Language Models, a Microsoft research team presents KOSMOS-1, a multimodal large language model (MLLM) that can perceive general modalities, learn in context, and follow instructions.

by Synced

2023-03-07

Comments 23

Large language models (LLMs) have emerged as powerful tools for a wide range of natural language processing (NLP) tasks. The push toward humanlike artificial general intelligence (AGI) however will require equipping such models with additional capabilities — and multimodal perception is an essential next step.

In the new paper Language Is Not All You Need: Aligning Perception with Language Models, a Microsoft research team presents KOSMOS-1, a multimodal large language model (MLLM) that is able to perceive general modalities, learn in context, and follow instructions. KOSMOS-1 achieves impressive performance on language, perception-language, and vision tasks.

The researchers propose that LLMs with multimodal perception will be better equipped to acquire commonsense knowledge beyond the information they glean from text alone; and that this perception enrichment will facilitate LLM applications in new domains such as robotics and document intelligence. Multimodal perception also has the benefit of unifying multiple APIs to form a single general graphical user interface.

KOSMOS-1 follows the MetaLM training process, where a transformer-based LLM acts as a general-purpose interface and is augmented with various perception modules. Consistent with the MetaLM philosophy, the team treats language models as a universal task layer, enabling KOSMOS-1 to unify various task predictions as texts and capably handle natural-language instructions and action sequences.

Given a previous context, KOSMOS-1 learns to generate texts in an autoregressive manner. All non-text input modalities are embedded and then fed into its backbone transformer-based causal language model, with the transformer decoder serving as a general-purpose interface for all modalities. By interacting with natural language and the other modalities, KOSMOS-1 naturally inherits the capabilities of in-context learning and instruction following; and can thus handle both language and perception-intensive tasks.

In their empirical study, the team trained KOSMOS-1 on web-scale multimodal corpora and conducted experiments on a wide range of language and multimodal tasks and the Raven IQ test. KOSMOS-1 achieved impressive performance on all tasks, demonstrating its strong multimodal perception and nonverbal reasoning abilities.

In KOSMOS-1, the researchers introduce an MLLM with promising new capabilities and opportunities. In the future, they plan to equip KOSMOS-1 with speech and scale up its model size.

The paper Language Is Not All You Need: Aligning Perception with Language Models is on arXiv.

Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

23 comments on “Toward AGI: Microsoft’s KOSMOS-1 MLLM Can Perceive General Modalities, Follow Instructions, and Perform In-Context Learning”

Creativeklick

2023-03-09

At Creative Klick, we understand that not everyone enjoys getting their picture taken. But we also know that a professional headshots atlantis is essential for anyone who wants to succeed in the business world. That’s why we’ve worked hard to perfect our headshot photography.

Loading...

Reply
granny

2023-03-10

When you have a new information about this topic, let me know.

Loading...

Reply
Pingback: Microsoft’s KOSMOS-1 MLLM Can Perceive General Modalities, Follow Instructions, and Perform In-Context Learning – One Man Company
zolisali

2023-03-20

Thanks for sharing. Its must read Article.

Loading...

Reply
Salima

2023-04-12

Thanks for this website!

Loading...

Reply
Pro Maskad

2023-04-26

Maskad offers best Mask sheet for post-procedure skin care for your specific skin type and procedure. When it comes to selecting a post-procedure face mask, there are a few things to keep in mind. After a facial treatment or any other cosmetic procedure, your skin may be more sensitive than usual and may require some extra care and attention. A good post procedure face mask should be gentle on the skin, hydrating, and nourishing.

Loading...

Reply
Salima

2023-05-10

his is a very good article his is a very good article

Loading...

Reply
djamila_st

2023-05-29

Thank you for this! It’s amazing to have these sort of things available.

Loading...

Reply
James Nduka

2023-06-05

This is a good write up. And it makes more sense to me.

Loading...

Reply
Cricut.com/setup

2023-06-30

The Cricut machine is the basis for creating craft projects requiring hours of cutting and designing. You can get your Cricut machine from the cricut.com/setup website. You’ll get all the setup guides for Cricut machines. Each device can be connected via the same steps regardless of your cutting machine.
Cricut Design Space Login

Loading...

Reply
tita

2023-07-16

This is Very Helpful Post,Thank you for sharing this information.
GTU

Loading...

Reply
Salima

2023-10-04

This is a good write up. And it makes more sense to me………..BY Salima FERHAT- FLL

Loading...

Reply
tita

2023-11-19

I enjoy the blog in general and I respect your stuff very much.
GTU

Loading...

Reply
Caby Lane

2024-01-04

Hello. When studying at the Faculty of Philology, you have to write a lot of essays on various topics. I am not a creative person, so it is very difficult for me. That’s why I decided to ask for help from the essay helper service, which became my assistant. Thanks to this service, I have a lot of time that I can spend on more important things. The specialists of this service always write an essay that covers the required topic in the best possible way.

Loading...

Reply
tita

2024-02-18

Nice topic thank you. This is truly amazing!.

GTU

Loading...

Reply
tita

2024-02-18

This is a good write up. And it makes more sense to me…
GTU

Loading...

Reply
tita

2024-03-19

Great information
GTU

Loading...

Reply
tita

2024-10-02

GTU
54Thank you for this!

Loading...

Reply
tita

2024-10-09

Excellent post. I was always checking this blog

GTU

Loading...

Reply
tita

2024-11-27

Thanks for sharing this post,
is very helpful article.

GTU

Loading...

Reply
basketball stars

2026-03-11

Love this! Keep up the great work.

Loading...

Reply
kart bros io

2026-03-14

Check out this website, it has everything you need!

Loading...

Reply
Pomodoro Timer

2026-03-25

This Pomodoro Timer r is fantastic! It helped me break my work into focused 25-minute sessions, making a huge project feel much more manageable. I’m getting so much more done without feeling burned out.

Loading...

Reply