Exploring MLX Swift: Getting Started with Tool Use

I have been waiting a while for the PR on tool use to get merged to finally publish this post. It is about how to get started with tool use in MLX Swift.

LLMs are powerful at generating text and answering questions based on their training data but they struggle with tasks that requires real-time information or interaction with external systems, like in iOS or macOS. This is where you can take advantage of "tool use" (also usually known as "function calling").

This is an introductory post that explains what tool use is, how to augment the LLMs with extra context and how to use a weather tool to fetch the current weather in Gurgaon, India.

Also, I would want to thank DePasqualeOrg for their contribution!

What is Tool Use?

Tool use allows an LLM to interact with external functions (tools) during its response generation. Instead of directly answering a question, the LLM can, based on the user's prompt, identify the need for a specific tool, form a request to that tool, and then incorporate the tool's output into its final response. For example, you can ask for the current weather, and the model will call a function named get_current_weather.

With tools, it can look up information and interact with other services, becoming way more useful than its current capabilities.

Defining a Tool in MLX Swift

Before an LLM can use a tool, you need to define it in a structured way that the LLM can understand. This involves creating a schema that describes the tool's name, purpose, and input parameters. MLX Swift uses a format similar to OpenAI's function calling API, making it straightforward to define your tools.

Let's break down the key components of a tool definition:

let currentWeatherToolSpec: [String: any Sendable] = [
    "type": "function",
    "function": [
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": [
            "type": "object",
            "properties": [
                "location": [
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA",
                ],
                "unit": [
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                ],
            ],
            "required": ["location"],
        ],
    ],
]

"type": "function": This top-level key indicates that we are defining a function (a tool).
"function": [...]: This dictionary contains the core details of the tool:
- "name": "get_current_weather": The name of the function. This is how the LLM will refer to the tool when it wants to use it. Choose a descriptive and concise name.
- "description": "Get the current weather in a given location": A brief description of what the tool does. The LLM uses this description to understand when it should call this tool. Be clear and specific about the tool's purpose.
- "parameters": [...]: This section defines the input parameters that the tool accepts.
  - "type": "object": The parameters are grouped together as a single object.
  - "properties": [...]: This dictionary describes each individual parameter:
    - "location": [...]: The definition of the location parameter.
      - "type": "string": The parameter's data type is a string.
      - "description": "The city and state, e.g. San Francisco, CA": A description of the parameter, providing context and examples.
    - "unit": [...]: The definition of the unit parameter.
      - "type": "string": The parameter's data type is a string.
      - "enum": ["celsius", "fahrenheit"]: This specifies that the unit parameter can only have one of two values: "celsius" or "fahrenheit". This helps the LLM understand the valid options.
  - "required": ["location"]: This array lists the parameters that are required for the tool to function. In this case, the location is mandatory.

Each tool should follow a consistent format and the description is important for the LLM to understand when it is appropriate to call it.

Example: Getting the Current Weather with MLX Swift

Let's try a practical example using MLX Swift where we give an LLM the ability to retrieve the current weather, using the tool definition.

The prepare method of the processor takes the user's prompt and, optionally, the tool definition:

let input = try await context.processor.prepare(
    input: .init(
        messages: [
            ["role": "system", "content": "You are a helpful assistant."],
            ["role": "user", "content": prompt],
        ], tools: includeWeatherTool ? [currentWeatherToolSpec] : nil))

This code snippet shows how the currentWeatherToolSpec is included in the tools parameter when the includeWeatherTool toggle is enabled. The prepare method, using the underlying Jinja templating, formats the prompt and tool definition in a way that the LLM understands.

Here is a more simplified version of the tool to fetch the weather data based on the location:

// The available tools are defined as a static property:
static let availableTools: [[String: any Sendable]] = [
    [
        "type": "function",
        "function": [
            "name": "get_weather_data",
            "description": "Get current weather data for a specific location",
            "parameters": [
                "type": "object",
                "properties": [
                    "location": [
                        "type": "string",
                        "description": "The city and state, e.g. Gurgaon, Haryana",
                    ]
                ],
                "required": ["location"],
            ],
        ],
    ],
]

The generate function calls MLXLMCommon.generate, which interacts with the LLM. The LLM, seeing the tool definition and the user's prompt ("What's the current weather in Gurgaon, Haryana?"), might produce output like this (depending on the specific model, here it is Qwen 2.5 1.5 B):

<tool_call> {"name": "get_current_weather", "arguments": {"location": "Gurgaon, Haryana", "unit": "celsius"}} </tool_call>

The important part is that the LLM does not directly answer the question. Instead, it generates a structured output (JSON in this case) indicating that it needs to call the get_current_weather tool with the specified parameters.

Parsing the Tool Call

Now, we need to parse this output. Here is one of the ways to process the output:

func processLLMOutput(_ text: String) async throws -> String? {
    var tokenText = text
    if tokenText.hasPrefix("<tool_call>") {
        tokenText = tokenText.replacingOccurrences(
            of: "<tool_call>", with: "")
    }

    toolCallBuffer += tokenText

    if toolCallBuffer.contains("</tool_call>") {
        toolCallBuffer = toolCallBuffer.replacingOccurrences(
            of: "</tool_call>", with: "")
        let jsonString = toolCallBuffer.trimmingCharacters(
            in: .whitespacesAndNewlines)

        let result = try await handleToolCall(jsonString)
        toolCallBuffer = ""
        return result
    }
    return nil
}

This method does the following:

Checks for the start tag: It checks if the incoming text (text) starts with <tool_call>.
Buffers the text: It appends the received text to a buffer as the tool call JSON arrives in multiple chunks.
Checks for the end tag: It checks if the buffer contains the closing tag </tool_call>.
Extracts and parses the JSON: If the complete tool call is received, it extracts the JSON string, removes whitespace, and calls handleToolCall to process it.
Resets the buffer: After processing, the buffer is cleared.
Returns tool output It returns the result of processing the tool's output.

Handling the Tool Call

The handleToolCall method then calls the appropriate function based on the parsed JSON:

private func handleToolCall(_ jsonString: String) async throws -> String {
    guard let data = jsonString.data(using: .utf8) else {
        throw UnifiedEvaluatorError.toolCallParsingFailed("Invalid JSON")
    }

    let toolCall = try decoder.decode(ToolCall.self, from: data)

    switch (toolCall.name, toolCall.arguments) {
    case (.getWeatherData, .weather(let args)):
        return try await fetchWeatherData(for: args.location)

    default:
        throw UnifiedEvaluatorError.invalidToolCall(toolCall.name.rawValue)
    }
}

The ToolCall struct is defined to match the expected JSON format:

struct ToolCall: Codable {
    let name: ToolCallType
    let arguments: Arguments

    enum Arguments: Codable {
        case weather(WeatherArguments)

        enum CodingKeys: String, CodingKey {
            case location
        }

        init(from decoder: Decoder) throws {
            let container = try decoder.container(keyedBy: CodingKeys.self)

            let location = try container.decode(String.self, forKey: .location)
            self = .weather(WeatherArguments(location: location))
        }

        func encode(to encoder: Encoder) throws {
            var container = encoder.container(keyedBy: CodingKeys.self)
            switch self {
            case .weather(let args):
                try container.encode(args.location, forKey: .location)
            }
        }
    }
}

struct WeatherArguments: Codable {
    let location: String
}

enum ToolCallType: String, Codable {
    case getWeatherData = "get_weather_data"
}

This code:

Switches on the tool name: It uses a switch statement to determine which tool to execute based on the name property of the ToolCall. You may end up adding more tools to your app so appropriately handles those cases here.
Calls the appropriate function: It calls the fetchWeatherData passing the necessary arguments.

Fetching Weather Data with WeatherKit (and OpenMeteo)

Here is some code for fetching the weather data using the given location:

func fetchWeather(forCity city: String) async throws -> WeatherData {
    do {
        let coordinates = try await getCoordinates(for: city)
        let weather = try await fetchWeather(for: coordinates)
        return weather
    } catch {
        throw error
    }
}

private func getCoordinates(for city: String) async throws -> CLLocation {
        let geocoder = CLGeocoder()

        do {
            let placemarks = try await geocoder.geocodeAddressString(city)
            guard let location = placemarks.first?.location else {
                throw WeatherKitError.locationNotFound
            }
            return location
        } catch {
            throw WeatherKitError.locationNotFound
        }
    }

    private func fetchWeather(for location: CLLocation) async throws -> WeatherData {
        do {
            let weather = try await weatherService.weather(for: location)
            
            let weatherData = WeatherData(
                temperature: weather.currentWeather.temperature.value,
                condition: weather.currentWeather.condition.description,
                humidity: weather.currentWeather.humidity,
                windSpeed: weather.currentWeather.wind.speed.value,
                feelsLike: weather.currentWeather.apparentTemperature.value,
                uvIndex: weather.currentWeather.uvIndex.value,
                visibility: weather.currentWeather.visibility.value,
                pressure: weather.currentWeather.pressure.value,
                precipitationChance: weather.hourlyForecast.first?.precipitationChance ?? 0.0
            )

            return weatherData
        } catch {
            return try await fetchWeatherFromOpenMeteo(for: location)
        }
    }

I have omitted the code for brevity but you can find the full weather related implementation here:

Continuing the Conversation

Finally, after retrieving the weather data, we want to present it to the user in a helpful way by continuing the conversation:

func continueConversation(with data: String, for context: String) async {
    let followUpPrompt =
        "The \(context) data is: \(data). You are a weather expert. Please explain the data and provide recommendations based on this information."
    running = false
    await generate(prompt: followUpPrompt, includingTools: false)
}

This method constructs a new prompt that includes the fetched weather data and instructs the LLM to act as a weather expert and provide explanations and recommendations.

Notice that includingTools is set to false for this follow-up prompt, as we do not want the LLM to call the weather tool again.

Output

You can now type in a prompt like "What's the weather in Gurgaon, Haryana?" and see the LLM call the weather tool, retrieve the data, and provide a response!

Moving Forward

This was a starter post on tool calling and there are many ways you can utilize it on-device for different use cases. You can take it forward with:

Exploring more tools: Create tools for other tasks (e.g., calendar access, health data, web searches)
Exploring prompts: Experiment with different prompts to guide the LLM's behavior. This is significant when playing with other models like Hermes 3 Llama 3.2 that has a different chat template for tool calling.
Exploring interactions: This was a one shot tool calling. You can pass the tool responses to each other and multi-turn conversations as well.

Happy MLXing!

Tagged in:

MLX Swift

Exploring MLX Swift: Getting Started with Tool Use

Master AI-assisted iOS Development

What is Tool Use?

Defining a Tool in MLX Swift

Example: Getting the Current Weather with MLX Swift

Parsing the Tool Call

Handling the Tool Call

Fetching Weather Data with WeatherKit (and OpenMeteo)

Continuing the Conversation

Output

Moving Forward

Master AI-assisted iOS Development

Rudrank Riyam

Other Stories

Exploring Replit: Make Personal Apps for Yourself

Exploring Claude: 3.7 Sonnet iOS Development

AiOS Dispatch 13

AiOS Dispatch 12

AiOS Dispatch 11

Press ESC to close

Or check our Popular Categories...

Subscribe to AiOS Dispatch!

What is Tool Use?

Defining a Tool in MLX Swift

Example: Getting the Current Weather with MLX Swift

Parsing the Tool Call

Handling the Tool Call

Fetching Weather Data with WeatherKit (and OpenMeteo)

Continuing the Conversation

Output

Moving Forward

Share Article:

Related Articles

Other Stories

Exploring Replit: Make Personal Apps for Yourself

Exploring Claude: 3.7 Sonnet iOS Development