Apple Intelligence: Unveiling Foundation Models Powering the Future of iOS, iPadOS, and macOS

Synced

2 years ago

A foundation model is a type of artificial intelligence neural network trained on vast amounts of raw data, typically through unsupervised learning, and designed to be adaptable for a wide range of tasks.

In a new paper Apple Intelligence Foundation Language Models, an Apple research team introduces the foundation language models developed to power Apple Intelligence features. These models include a ∼3 billion parameter model optimized for efficient on-device performance and a larger server-based model designed for Private Cloud Compute.

At the 2024 Worldwide Developers Conference, Apple unveiled Apple Intelligence, a personal intelligence system seamlessly integrated into iOS 18, iPadOS 18, and macOS Sequoia. Apple Intelligence consists of highly capable generative models that are fast, efficient, and tailored to users’ everyday needs, adapting in real-time to their current activities. interactions across different apps.

These foundation models that built in Apple Intelligence have been fine-tuned for various user experiences, such as writing and refining text, prioritizing and summarizing notifications, creating playful images for conversations, and automating in-app actions to streamline

The report details how two key models—AFM-on-device, a ∼3 billion parameter language model, and AFM-server, a larger server-based language model—have been designed and optimized to perform specialized tasks with efficiency, accuracy, and a focus on user privacy.

The AFM base models are dense decoder-only models based on the Transformer architecture, incorporating several key design choices:

A shared input/output embedding matrix to reduce memory usage.
Pre-Normalization using RMSNorm for improved training stability.
Query/key normalization to enhance training stability.
Grouped-query attention (GQA) with eight key-value heads to minimize the KV-cache memory footprint.
The SwiGLU activation function for increased efficiency.
RoPE positional embeddings with a base frequency set to 500k to support long-context processing.

This report provides an overview of the model architecture, the training data, the training process, the optimization techniques for inference, and the evaluation results. The team also emphasizes their commitment to Responsible AI, detailing how ethical principles were integrated throughout the development of these models.

The paper Apple Intelligence Foundation Language Models is on arXiv.

Author: Hecate He | Editor: Chain Zhang

Share this: