Exploring MLX Swift: Adding On-Device Inference to your App
I love the output of programming more than the process itself, and that applies to MLX Swift too. So, in this post, you will learn how to easily add on-device inference to your existing app.
What is MLX, Though?
MLX is a machine learning framework by Apple that is specifically designed for Apple silicon. It lets you run language models locally on your devices. This means no round trips to a server and better privacy for your users. And use them easily on flights or trains that provides minimal internet speeds.
MLX Swift expands MLX to the Swift language, so iOS developers like us do not have to spend time indenting Python code.
We will learn how to use the MLXLLM
package, a library designed to provide a simple way to use pre-trained Large Language Models (LLMs) for on-device inference. By the end of this you will be able to generate responses locally in your app.
TL;DR: Getting Started
Here are the few steps involved in adding a model to your application:
- Add the MLX Swift Examples Package: Add the
MLXLLM
package to your project. This provides all the model logic, utilities and helper classes to download and perform inference. I usually avoid dependencies but this one definitely makes my life easier. - Choose a Model: Pick a pre-trained model to run on device. For this post, I will choose a lightweight one so it is easier to get started with.
- Load the Model: Use the existing configured model from
MLXLLM
to load the weights and configuration. - Create Input: Create an input (prompt) for the model.
- Run inference: Feed this prompt into the model and generate the output.
Note: The given code is not the best practices for iOS development. The idea is to get started with MLX Swift in 20 lines of code, and see the ouput of on-device inference visually.
Adding the MLX Swift Examples Package
Add the MLX Swift Example repo as a package dependency to your project:
- In Xcode, open your project and go to the Project Settings -> Package Dependencies
- Press
+
and paste this:https://github.com/ml-explore/mlx-swift-examples/
and select Add Package - Set the
Dependency Rule
toBranch
and themain
branch - Add
MLXLLM
to your desired target
Next, import the given packages in the file that you prefer:
import MLXLLM
import MLXLMCommon
Choosing a Model
The MLXLLM
package provides constants for popular models, such as:
// A small model that works well on many devices
let modelConfiguration = ModelRegistry.llama3_2_1B_4bit
These models are hosted on the Hugging Face Hub, and MLXLLM
knows where to download the weights and configuration for each of them. For the initial example, let us use llama3_2_1B_4bit
to run the model on most devices.
Loading the Model
Download the model weights and set up the model with the following:
let modelConfiguration = ModelRegistry.llama3_2_1B_4bit
let modelContainer = try await LLMModelFactory.shared.loadContainer(configuration: modelConfiguration) { progress in
debugPrint("Downloading \(modelConfiguration.name): \(Int(progress.fractionCompleted * 100))%")
}
LLMModelFactory.shared.loadContainer
takes care of downloading the model files, creating the model using the correct architecture, and loading the weights from a.safetensors
file. Note thatloadContainer
is an asynchronous method, so useawait
.- The code also provides a closure for
progress
and you can use it to update the UI as needed during the download process.
Prepare Input
Convert the prompt to input understood by the mode which is done using the prepare
method in ModelContext
:
let prompt = "What is really the meaning of life?"
let result = try await modelContainer.perform { [prompt] context in
let input = try await context.processor.prepare(input: .init(prompt: prompt))
}
perform
onModelContainer
provides anactor
so that you can threadsafe access the model and the tokenizer inside the block.- The code also specifies
input
which is ofLMInput
type. prompt
is turned into a set of tokens usingcontext.processor.prepare(input:)
Run Inference
Run the inference and use the text that is produced:
return try MLXLMCommon.generate(
input: input, parameters: .init(), context: context) { tokens in
let text = context.tokenizer.decode(tokens: tokens)
Task { @MainActor in
self.output = text
}
return .more
}
- We also use the
context
that we get from theperform
block. generate
method runs all of the code necessary to do the inference loop to produce the text from the tokens.- The closure provides a callback for the tokens as they are produced, and allows the display of intermediate values.
Complete Code Example
Here is the code shown above combined together into a single view with a generate
function:
import SwiftUI
import MLXLLM
import MLXLMCommon
struct ContentView: View {
@State private var output: String = ""
var body: some View {
VStack {
Image(systemName: "globe")
.imageScale(.large)
.foregroundStyle(.tint)
Text("Hello, world!")
Text(output)
}
.padding()
.task {
do {
try await generate()
} catch {
debugPrint(error)
}
}
}
}
extension ContentView {
private func generate() async throws {
let modelConfiguration = ModelRegistry.llama3_2_1B_4bit
let modelContainer = try await LLMModelFactory.shared.loadContainer(configuration: modelConfiguration) { progress in
debugPrint("Downloading \(modelConfiguration.name): \(Int(progress.fractionCompleted * 100))%")
}
let prompt = "What is really the meaning of life?"
let _ = try await modelContainer.perform { [prompt] context in
let input = try await context.processor.prepare(input: .init(prompt: prompt))
return try MLXLMCommon.generate(
input: input, parameters: .init(), context: context) { tokens in
let text = context.tokenizer.decode(tokens: tokens)
Task { @MainActor in
self.output = text
}
return .more
}
}
}
}
You can use it as a boilerplate code to get started with MLX Swift in your app!
Moving Forward
This is a minimal version of using MLXLLM
. You can now experiment with:
- Different model configurations: Using other pre-defined models from
ModelRegistry
- More sophisticated decoding/generation parameters using
GenerateParameters
You are ready to start building applications with on-device language models. In my next post, I will show you how to configure a different model which is not defined in ModelRegistry
and start generating text - and you will be surprised at how few lines of code it takes!
If you are working with MLX Swift, I would love to hear about your experiences with. Drop a comment below or reach out on Twitter @rudrankriyam or Bluesky @rudrankriyam.bsky.social!
Happy MLXing!