AI Machine Learning & Data Science Nature Language Tech Popular Research

Toward AGI: Microsoft’s KOSMOS-1 MLLM Can Perceive General Modalities, Follow Instructions, and Perform In-Context Learning

In the new paper Language Is Not All You Need: Aligning Perception with Language Models, a Microsoft research team presents KOSMOS-1, a multimodal large language model (MLLM) that can perceive general modalities, learn in context, and follow instructions.

Large language models (LLMs) have emerged as powerful tools for a wide range of natural language processing (NLP) tasks. The push toward humanlike artificial general intelligence (AGI) however will require equipping such models with additional capabilities — and multimodal perception is an essential next step.

In the new paper Language Is Not All You Need: Aligning Perception with Language Models, a Microsoft research team presents KOSMOS-1, a multimodal large language model (MLLM) that is able to perceive general modalities, learn in context, and follow instructions. KOSMOS-1 achieves impressive performance on language, perception-language, and vision tasks.

The researchers propose that LLMs with multimodal perception will be better equipped to acquire commonsense knowledge beyond the information they glean from text alone; and that this perception enrichment will facilitate LLM applications in new domains such as robotics and document intelligence. Multimodal perception also has the benefit of unifying multiple APIs to form a single general graphical user interface.

KOSMOS-1 follows the MetaLM training process, where a transformer-based LLM acts as a general-purpose interface and is augmented with various perception modules. Consistent with the MetaLM philosophy, the team treats language models as a universal task layer, enabling KOSMOS-1 to unify various task predictions as texts and capably handle natural-language instructions and action sequences.

Given a previous context, KOSMOS-1 learns to generate texts in an autoregressive manner. All non-text input modalities are embedded and then fed into its backbone transformer-based causal language model, with the transformer decoder serving as a general-purpose interface for all modalities. By interacting with natural language and the other modalities, KOSMOS-1 naturally inherits the capabilities of in-context learning and instruction following; and can thus handle both language and perception-intensive tasks.

In their empirical study, the team trained KOSMOS-1 on web-scale multimodal corpora and conducted experiments on a wide range of language and multimodal tasks and the Raven IQ test. KOSMOS-1 achieved impressive performance on all tasks, demonstrating its strong multimodal perception and nonverbal reasoning abilities.

In KOSMOS-1, the researchers introduce an MLLM with promising new capabilities and opportunities. In the future, they plan to equip KOSMOS-1 with speech and scale up its model size.

The paper Language Is Not All You Need: Aligning Perception with Language Models is on arXiv.


Author: Hecate He | Editor: Michael Sarazen


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

17 comments on “Toward AGI: Microsoft’s KOSMOS-1 MLLM Can Perceive General Modalities, Follow Instructions, and Perform In-Context Learning

  1. At Creative Klick, we understand that not everyone enjoys getting their picture taken. But we also know that a professional headshots atlantis is essential for anyone who wants to succeed in the business world. That’s why we’ve worked hard to perfect our headshot photography.

  2. When you have a new information about this topic, let me know.

  3. Pingback: Microsoft’s KOSMOS-1 MLLM Can Perceive General Modalities, Follow Instructions, and Perform In-Context Learning – One Man Company

  4. Thanks for sharing. Its must read Article.

  5. Salima

    Thanks for this website!

  6. Maskad offers best Mask sheet for post-procedure skin care for your specific skin type and procedure. When it comes to selecting a post-procedure face mask, there are a few things to keep in mind. After a facial treatment or any other cosmetic procedure, your skin may be more sensitive than usual and may require some extra care and attention. A good post procedure face mask should be gentle on the skin, hydrating, and nourishing.

  7. his is a very good article his is a very good article

  8. Thank you for this! It’s amazing to have these sort of things available.

  9. This is a good write up. And it makes more sense to me.

  10. The Cricut machine is the basis for creating craft projects requiring hours of cutting and designing. You can get your Cricut machine from the cricut.com/setup website. You’ll get all the setup guides for Cricut machines. Each device can be connected via the same steps regardless of your cutting machine.
    Cricut Design Space Login

  11. This is Very Helpful Post,Thank you for sharing this information.
    GTU

  12. This is a good write up. And it makes more sense to me………..BY Salima FERHAT- FLL

  13. I enjoy the blog in general and I respect your stuff very much.
    GTU

  14. Caby Lane

    Hello. When studying at the Faculty of Philology, you have to write a lot of essays on various topics. I am not a creative person, so it is very difficult for me. That’s why I decided to ask for help from the essay helper service, which became my assistant. Thanks to this service, I have a lot of time that I can spend on more important things. The specialists of this service always write an essay that covers the required topic in the best possible way.

  15. Nice topic thank you. This is truly amazing!.

    GTU

  16. This is a good write up. And it makes more sense to me…
    GTU

  17. Great information
    GTU

Leave a Reply

Your email address will not be published. Required fields are marked *