October, 2024 | Synced

by Synced 2024-10-30 5

From OCR to Multi-Image Insight: Apple’s MM1.5 with Enhanced Text-Rich Image Understanding and Visual Reasoning

Building on MM1’s success, Apple’s new paper, MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning, introduces an improved model family aimed at enhancing capabilities in text-rich image understanding, visual grounding, and multi-image reasoning.

by Synced 2024-10-28 4

AI Machine Learning & Data Science Research

AI Self-Evolution: How Long-Term Memory Drives the Next Era of Intelligent Models

A research team investigates AI self-evolution. Their work examines how models enhanced with Long-Term Memory (LTM) can adapt and evolve through interaction with their environments, a key step toward achieving more dynamic AI.

by Synced 2024-10-25 4

AI Machine Learning & Data Science Research

Breaking Barriers in Cellular Automata with CAX: Faster, Scalable, and Open for All

In a new paper CAX: Cellular Automata Accelerated in JAX, a research team introduces Cellular Automata Accelerated in JAX, a powerful open-source library designed to enhance CA research, which enables rapid CA simulations through extensive parallelization on various hardware accelerators, including CPUs, GPUs, and TPUs.

by Synced 2024-10-23 6

AI Machine Learning & Data Science Research

LLMs as Code Architects: Meta’s New Approach to Precise Code Transformations

In a new paper Don’t Transform the Code, Code the Transforms: Towards Precise Code Rewriting using LLMs, a Meta research team proposes a novel chain-of-thought strategy to efficiently generate code transformations using LLMs. Their approach enables LLMs to derive transformations based on a small set of input/output examples.

by Synced 2024-10-21 4

AI Machine Learning & Data Science Research

Thinking Fast and Slow: Google DeepMind’s Dual-Agent Architecture for Smarter AI

A Google DeepMind research team proposes a biologically-inspired dual-system framework for intelligent agents. This “Talker-Reasoner” architecture aligns with Kahneman’s concept, where System 1 is fast and intuitive, while System 2 is slower and deliberative.

by Synced 2024-10-16 5

AI Machine Learning & Data Science Research

From Dense to Dynamic: NVIDIA’s Innovations in Upcycling LLMs to Sparse MoE

In a new paper Upcycling Large Language Models into Mixture of Experts, an NVIDIA research team introduces a new “virtual group” initialization technique to facilitate the transition of dense models into fine-grained MoE structures.

by Synced 2024-10-12 5

AI Machine Learning & Data Science Research

Web Data to Real-World Action: Enabling Robots to Master Unseen Tasks

A research team presents a novel language-conditioned robot manipulation framework called Gen2Act, which achieves generalization to unseen tasks using publicly available web data, eliminating the need to collect specific robot data for every task.

by Synced 2024-10-09 3

AI Machine Learning & Data Science Research

Scaling Multi-Objective Optimization: Meta & FAIR’s CGPO Advances General-purpose LLMs

In a new paper The Perfect Blend: Redefining RLHF with Mixture of Judges, a research team from Meta GenAI and FAIR developed Constrained Generative Policy Optimization (CGPO), which offers a more structured approach to RLHF, advancing the performance of general-purpose LLMs.

by Synced 2024-10-07 3

AI Machine Learning & Data Science Research

Instant 3D Vision: Apple’s Depth Pro Delivers High-Precision Depth Maps in 0.3 Seconds

Apple introduces Depth Pro, a state-of-the-art foundation model designed for zero-shot metric monocular depth estimation. This model can generate high-resolution depth maps with exceptional clarity and fine detail, producing a 2.25-megapixel depth map in just 0.3 seconds on a standard GPU.

by Synced 2024-10-03 8

AI Machine Learning & Data Science Research

Law of the Weakest Link: Advancing Large Language Models Through Cross-Capability

A joint research team from Meta and the University of Illinois Urbana-Champaign introduces CrossEval, a benchmark designed to assess both individual and cross capabilities. Their findings demonstrate that LLMs often adhere to the “Law of the Weakest Link”—where performance on complex tasks is limited by the weakest capability.