·5 min read

Exploring MLX Swift: Adding On-Device Inference to your App

44% OFF - Black Friday Sale

Use the coupon "BLACKFRIDAY" at checkout page for AI Driven Coding, Foundation Models MLX, MusicKit, freelancing or technical writing.

I like the output of programming more than the process itself, and that applies to MLX Swift too. So, in this post, you will learn how to easily add on-device inference to your existing app.

What is MLX, Though?

MLX is a machine learning framework by Apple that is specifically designed for Apple silicon. It lets you run language models locally on your devices. This means no round trips to a server and better privacy for your users. And use them easily on flights or trains that provides minimal internet speeds.

MLX Swift expands MLX to the Swift language, so iOS developers like us do not have to spend time indenting Python code.

GitHub - ml-explore/mlx-swift: Swift API for MLX

We will learn how to use the MLXLMCommon package, a library designed to provide a simple way to use pre-trained Large Language Models (LLMs) for on-device inference. By the end of this you will be able to generate responses locally in your app.

TL;DR: Getting Started

Here are the few steps involved in adding a model to your application:

  • Add the MLX Swift LM Package: Add the MLXLMCommon package to your project. This provides all the model logic, utilities and helper classes to download and perform inference. I usually avoid dependencies but this one definitely makes my life easier.
  • Choose a Model: Pick a pre-trained model to run on device. For this post, I will choose a lightweight one so it is easier to get started with.
  • Load the Model: Use the loadModel function to download and load the model.
  • Create a ChatSession: Create a ChatSession to manage the conversation.
  • Run inference: Use respond(to:) or streamResponse(to:) to generate output.

Note: The given code is not the best practices for iOS development. The idea is to get started with MLX Swift in under 15 lines of code, and see the output of on-device inference visually.

Adding the MLX Swift LM Package

Add the MLX Swift LM repo as a package dependency to your project:

  • In Xcode, open your project and go to the Project Settings -> Package Dependencies
  • Press + and paste this: https://github.com/ml-explore/mlx-swift-lm/ and select Add Package
  • Set the Dependency Rule to Branch and the main branch
  • Add MLXLMCommon to your desired target

Next, import the package in the file that you prefer:

import MLXLMCommon

Choosing a Model

Models are loaded directly by their Hugging Face ID. For example:

// A small model that works well on many devices
let model = try await loadModel(id: "mlx-community/Qwen3-4B-4bit")

These models are hosted on the Hugging Face Hub. For the initial example, let us use Qwen3-4B-4bit to run the model on most devices.

Loading the Model and Creating a ChatSession

The loadModel function takes care of downloading the model files, creating the model using the correct architecture, and loading the weights from a .safetensors file:

let model = try await loadModel(id: "mlx-community/Qwen3-4B-4bit")
let session = ChatSession(model)
  • loadModel(id:) downloads and loads the model asynchronously.
  • ChatSession manages the conversation context and KV cache for you.

The new ChatSession API takes inspiration from Apple's Foundation Models framework, providing a familiar way for Swift developers working with on-device language models.

Run Inference

Generate a response using either respond(to:) for a complete response or streamResponse(to:) for streaming:

let prompt = "What is really the meaning of life?"
 
// Option 1: Get complete response
let response = try await session.respond(to: prompt)
 
// Option 2: Stream the response
for try await text in session.streamResponse(to: prompt) {
    self.output += text
}
  • respond(to:) returns the complete response as a single string.
  • streamResponse(to:) returns an AsyncThrowingStream that yields text chunks as they are generated, perfect for real-time UI updates.

Complete Code Example

Here is the code shown above combined together into a single view with a generate function:

import SwiftUI
import MLXLMCommon
 
struct ContentView: View {
    @State private var output: String = ""
 
    var body: some View {
        VStack {
            Image(systemName: "globe")
                .imageScale(.large)
                .foregroundStyle(.tint)
            Text("Hello, world!")
 
            Text(output)
        }
        .padding()
        .task {
            do {
                try await generate()
            } catch {
                debugPrint(error)
            }
        }
    }
}
 
extension ContentView {
    private func generate() async throws {
        let model = try await loadModel(id: "mlx-community/Qwen3-4B-4bit")
        let session = ChatSession(model)
 
        let prompt = "What is really the meaning of life?"
 
        for try await text in session.streamResponse(to: prompt) {
            self.output += text
        }
    }
}

You can use it as a boilerplate code to get started with MLX Swift in your app!

Moving Forward

This is a minimal version of using MLXLMCommon. You can now experiment with:

  • Different models: Try other models from the mlx-community on Hugging Face
  • System instructions: Pass instructions to ChatSession to customize the model's behavior
  • Generation parameters: Use GenerateParameters to control temperature, token limits, and more

You are ready to start building applications with on-device language models. In my next post, I will show you how to configure a different model and start generating text, and you will be surprised at how few lines of code it takes!

If you are working with MLX Swift, I would love to hear about your experiences with. Reach out on X @rudrank or Bluesky @rudrankriyam.bsky.social!

Happy MLXing!

44% OFF - Black Friday Sale

Use the coupon "BLACKFRIDAY" at checkout page for AI Driven Coding, Foundation Models MLX, MusicKit, freelancing or technical writing.

Post Topics

Explore more in these categories:

Related Articles

Exploring AI: Cosine Similarity for RAG using Accelerate and Swift

Learn how to implement cosine similarity using Accelerate framework for iOS and macOS apps. Build Retrieval-Augmented Generation (RAG) systems breaking down complex mathematics into simple explanations and practical Swift code examples. Optimize document search with vector similarity calculations.

Exploring App Intents: Creating Your First App Intent

App Intents expose your app's actions to iOS, Siri, and Shortcuts, making it accessible & discoverable. This guide introduces the basics of App Intents, explaining what they are, why they're important, & how to create a simple AppIntent. Learn to extend app's functionality beyond its boundaries.

Exploring Cursor: Accessing External Documentation using @Doc

Boost coding productivity with Cursor's @Doc feature. Learn how to index external documentation directly in your workspace, eliminating tab-switching and keeping you in flow.