Month: October 2024

AI Machine Learning & Data Science Research

From OCR to Multi-Image Insight: Apple’s MM1.5 with Enhanced Text-Rich Image Understanding and Visual Reasoning

Building on MM1’s success, Apple’s new paper, MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning, introduces an improved model family aimed at enhancing capabilities in text-rich image understanding, visual grounding, and multi-image reasoning.