Exploring MLX Swift: Adding On-Device Inference to your App
44% OFF - Black Friday Sale
Use the coupon "BLACKFRIDAY" at checkout page for AI Driven Coding, Foundation Models MLX, MusicKit, freelancing or technical writing.
I like the output of programming more than the process itself, and that applies to MLX Swift too. So, in this post, you will learn how to easily add on-device inference to your existing app.
What is MLX, Though?
MLX is a machine learning framework by Apple that is specifically designed for Apple silicon. It lets you run language models locally on your devices. This means no round trips to a server and better privacy for your users. And use them easily on flights or trains that provides minimal internet speeds.
MLX Swift expands MLX to the Swift language, so iOS developers like us do not have to spend time indenting Python code.
GitHub - ml-explore/mlx-swift: Swift API for MLX
We will learn how to use the MLXLMCommon package, a library designed to provide a simple way to use pre-trained Large Language Models (LLMs) for on-device inference. By the end of this you will be able to generate responses locally in your app.
TL;DR: Getting Started
Here are the few steps involved in adding a model to your application:
- Add the MLX Swift LM Package: Add the
MLXLMCommonpackage to your project. This provides all the model logic, utilities and helper classes to download and perform inference. I usually avoid dependencies but this one definitely makes my life easier. - Choose a Model: Pick a pre-trained model to run on device. For this post, I will choose a lightweight one so it is easier to get started with.
- Load the Model: Use the
loadModelfunction to download and load the model. - Create a ChatSession: Create a
ChatSessionto manage the conversation. - Run inference: Use
respond(to:)orstreamResponse(to:)to generate output.
Note: The given code is not the best practices for iOS development. The idea is to get started with MLX Swift in under 15 lines of code, and see the output of on-device inference visually.
Adding the MLX Swift LM Package
Add the MLX Swift LM repo as a package dependency to your project:
- In Xcode, open your project and go to the Project Settings -> Package Dependencies
- Press
+and paste this:https://github.com/ml-explore/mlx-swift-lm/and select Add Package - Set the
Dependency RuletoBranchand themainbranch - Add
MLXLMCommonto your desired target
Next, import the package in the file that you prefer:
import MLXLMCommonChoosing a Model
Models are loaded directly by their Hugging Face ID. For example:
// A small model that works well on many devices
let model = try await loadModel(id: "mlx-community/Qwen3-4B-4bit")These models are hosted on the Hugging Face Hub. For the initial example, let us use Qwen3-4B-4bit to run the model on most devices.
Loading the Model and Creating a ChatSession
The loadModel function takes care of downloading the model files, creating the model using the correct architecture, and loading the weights from a .safetensors file:
let model = try await loadModel(id: "mlx-community/Qwen3-4B-4bit")
let session = ChatSession(model)loadModel(id:)downloads and loads the model asynchronously.ChatSessionmanages the conversation context and KV cache for you.
The new ChatSession API takes inspiration from Apple's Foundation Models framework, providing a familiar way for Swift developers working with on-device language models.
Run Inference
Generate a response using either respond(to:) for a complete response or streamResponse(to:) for streaming:
let prompt = "What is really the meaning of life?"
// Option 1: Get complete response
let response = try await session.respond(to: prompt)
// Option 2: Stream the response
for try await text in session.streamResponse(to: prompt) {
self.output += text
}respond(to:)returns the complete response as a single string.streamResponse(to:)returns anAsyncThrowingStreamthat yields text chunks as they are generated, perfect for real-time UI updates.
Complete Code Example
Here is the code shown above combined together into a single view with a generate function:
import SwiftUI
import MLXLMCommon
struct ContentView: View {
@State private var output: String = ""
var body: some View {
VStack {
Image(systemName: "globe")
.imageScale(.large)
.foregroundStyle(.tint)
Text("Hello, world!")
Text(output)
}
.padding()
.task {
do {
try await generate()
} catch {
debugPrint(error)
}
}
}
}
extension ContentView {
private func generate() async throws {
let model = try await loadModel(id: "mlx-community/Qwen3-4B-4bit")
let session = ChatSession(model)
let prompt = "What is really the meaning of life?"
for try await text in session.streamResponse(to: prompt) {
self.output += text
}
}
}You can use it as a boilerplate code to get started with MLX Swift in your app!
Moving Forward
This is a minimal version of using MLXLMCommon. You can now experiment with:
- Different models: Try other models from the
mlx-communityon Hugging Face - System instructions: Pass
instructionstoChatSessionto customize the model's behavior - Generation parameters: Use
GenerateParametersto control temperature, token limits, and more
You are ready to start building applications with on-device language models. In my next post, I will show you how to configure a different model and start generating text, and you will be surprised at how few lines of code it takes!
If you are working with MLX Swift, I would love to hear about your experiences with. Reach out on X @rudrank or Bluesky @rudrankriyam.bsky.social!
Happy MLXing!
44% OFF - Black Friday Sale
Use the coupon "BLACKFRIDAY" at checkout page for AI Driven Coding, Foundation Models MLX, MusicKit, freelancing or technical writing.
Post Topics
Explore more in these categories: