Exploring AI: Cosine Similarity for RAG using Accelerate and Swift

When searching the whole codebase, it becomes important to find the relevant files to provide as context for inference. If you work on an app that helps the user chat with an external document, you retrieve similar context to add to the prompt.

In this blog post, I will explore how to implement cosine similarity using Apple's Accelerate framework. It is specific for developers who want to build Retrieval-Augmented Generation (RAG) systems in their iOS/macOS apps. I will try to break everything down into simple explanations; time to go back to college!

What is Cosine Similarity?

The first time I heard the word cosine was in high school, and the last time was during my Mathematics bachelors degree. Looks like I am back again to understand what cosine similarity is and why it is useful.

Cosine similarity is a measure that helps determine how similar two vectors (lists of numbers) are by calculating the cosine of the angle between them. The result ranges from -1 to 1, where:

  • 1 means the vectors are identical
  • 0 means they are perpendicular (completely different)
  • -1 means they are opposite

Case 1: Perfect Similarity (Result = 1)

This happens when vectors point in exactly the same direction. They do not need to be identical in magnitude, just in direction.

// Example 1: Identical vectors
let vector1 = [1, 2, 3]
let vector2 = [1, 2, 3]
// Cosine similarity = 1.0

// Example 2: Same direction, different magnitude
let vector3 = [2, 4, 6]  // vector1 multiplied by 2
let vector4 = [1, 2, 3]
// Cosine similarity = 1.0

// In document similarity terms:
let doc1 = [0.8, 0.2, 0.0]  // Document about "doggo" (80% doggo, 20% pets)
let doc2 = [0.4, 0.1, 0.0]  // Another document about "doggo" (same proportions)
// These would have cosine similarity = 1.0 because they are about the same topic

Case 2: Complete Dissimilarity (Result = 0)

This occurs when vectors are perpendicular (at 90 degrees to each other), meaning they have no similarity in direction.

// Example 1: Perpendicular vectors in 2D space
let vector1 = [1, 0]  // Points along x-axis
let vector2 = [0, 1]  // Points along y-axis
// Cosine similarity = 0.0

// In document similarity terms:
let doc1 = [1, 0, 0]  // Document purely about "doggo"
let doc2 = [0, 1, 0]  // Document purely about "cars"
// These would have cosine similarity = 0.0 because they are about completely different topics

Case 3: Perfect Dissimilarity (Result = -1)

This happens when vectors point in exactly opposite directions.

// Example 1: Opposite vectors
let vector1 = [1, 2, 3]
let vector2 = [-1, -2, -3]
// Cosine similarity = -1.0

// Example 2: Opposite direction, different magnitude
let vector3 = [2, 4, 6]
let vector4 = [-1, -2, -3]
// Cosine similarity = -1.0

// In document similarity terms:
let doc1 = [0.8, -0.2]  // Document expressing positive sentiment
let doc2 = [-0.8, 0.2]  // Same topic but opposite sentiment
// These would have cosine similarity = -1.0

Real-World Example in RAG Context

Let's look at a practical example using simple document embeddings:

// Example document embeddings (simplified for demonstration)
// Each vector represents topics: [dogs, cars, technology]

let petShopDoc = [0.8, 0.7, 0.0, 0.0]  // Document about a pet shop
let doggoArticle = [0.9, 0.2, 0.0, 0.0] // Article specifically about doggo
let carReview  = [0.0, 0.0, 0.9, 0.1] // Car review article
let query      = [1.0, 0.0, 0.0, 0.0] // User query about "doggo"

// Expected similarities:
// query ↔ doggoArticle    = Very high (close to 1.0) - Most relevant
// query ↔ petShopDoc    = Moderate (around 0.7)    - Somewhat relevant
// query ↔ carReview     = Very low (close to 0.0)  - Not relevant

In RAG systems, we use cosine similarity to find the most relevant documents or pieces of information by comparing their vector embeddings.

TL;DR: of Cosine Similarity

Here is the formula for it:

// Cosine Similarity = (A·B) / (||A|| × ||B||)
//
// Where:
// A·B = dot product of vectors
// ||A|| and ||B|| = magnitudes (lengths) of vectors

And the step-by-step breakdown:

// Given two vectors:
let vectorA = [1.0, 2.0, 3.0]
let vectorB = [4.0, 5.0, 6.0]

// 1. Calculate Dot Product (A·B):
dotProduct = (1.0 × 4.0) + (2.0 × 5.0) + (3.0 × 6.0)
// = 4 + 10 + 18
// = 32

// 2. Calculate Magnitude of A (||A||):
magnitudeA = √(1.0² + 2.0² + 3.0²)
// = √(1 + 4 + 9)
// = √14

// 3. Calculate Magnitude of B (||B||):
magnitudeB = √(4.0² + 5.0² + 6.0²)
// = √(16 + 25 + 36)
// = √77

// 4. Final Calculation:
similarity = 32 / (√14 × √77)
// ≈ 0.9850
  1. Multiply corresponding numbers and add them up (dot product)
  2. Calculate the length of first vector (magnitude A)
  3. Calculate the length of second vector (magnitude B)
  4. Divide dot product by the product of the lengths

The methods that you will need from Accelerate:

// For Step 1 (Dot Product):
vDSP_dotpr()

// For Steps 2 & 3 (Magnitudes):
vDSP_svesq()  // For squaring and summing
sqrt()        // For final square root

Now that we understand what we are calculating, let us implement it using Accelerate!

Implementating Cosine Similarity

I first define four specific error cases:

  • emptyVectors: When either input vector is empty
  • unequalVectorLengths: When vectors have different lengths
  • zeroVectorFound: When either vector has zero magnitude
  • calculationError: For numerical instability issues
/// Error types specific to cosine similarity calculations
 public enum CosineSimilarityError: Error {
     case emptyVectors
     case unequalVectorLengths
     case zeroVectorFound
     case calculationError
 }

Then, I define a class and add the main calculate method and a handy calculateBatch one as well:

import Accelerate

/// A class for computing cosine similarity between vectors using Apple's Accelerate framework
public final class CosineSimilarity {

    /// Calculates cosine similarity between two vectors
    ///
    /// - Parameters:
    ///   - vectorA: First vector of embeddings
    ///   - vectorB: Second vector of embeddings
    /// - Returns: Similarity score between -1 and 1, or nil if calculation fails
    /// - Throws: CosineSimilarityError for invalid inputs
    public static func calculate(_ vectorA: [Float], _ vectorB: [Float]) throws -> Float {
        // Input validation
        guard !vectorA.isEmpty && !vectorB.isEmpty else {
            throw CosineSimilarityError.emptyVectors
        }

        guard vectorA.count == vectorB.count else {
            throw CosineSimilarityError.unequalVectorLengths
        }

        let count = vDSP_Length(vectorA.count)

        // Calculate dot product (A·B)
        var dotProduct: Float = 0.0
        vDSP_dotpr(vectorA, 1, vectorB, 1, &dotProduct, count)

        // Calculate magnitudes (||A|| and ||B||)
        var magnitudeA: Float = 0.0
        var magnitudeB: Float = 0.0

        // Compute squared magnitudes
        vDSP_svesq(vectorA, 1, &magnitudeA, count)
        vDSP_svesq(vectorB, 1, &magnitudeB, count)

        // Compute final magnitudes
        magnitudeA = sqrt(magnitudeA)
        magnitudeB = sqrt(magnitudeB)

        // Check for zero vectors
        guard magnitudeA > Float.ulpOfOne && magnitudeB > Float.ulpOfOne else {
            throw CosineSimilarityError.zeroVectorFound
        }

        // Calculate similarity
        let similarity = dotProduct / (magnitudeA * magnitudeB)

        // Handle potential numerical instability
        guard similarity.isFinite else {
            throw CosineSimilarityError.calculationError
        }

        // Clamp result to [-1, 1] to handle floating-point precision issues
        return min(max(similarity, -1), 1)
    }

    /// Batch calculation of cosine similarities between one vector and multiple others
    ///
    /// - Parameters:
    ///   - vector: Reference vector
    ///   - vectors: Array of vectors to compare against
    /// - Returns: Array of similarity scores
    /// - Throws: CosineSimilarityError for invalid inputs
    public static func calculateBatch(_ vector: [Float], _ vectors: [[Float]]) throws -> [Float] {
       try vectors.map { try calculate(vector, $0) }
    }
}

I do some input validation that neither vector is empty and both vectors have the same length:

// Input validation
guard !vectorA.isEmpty && !vectorB.isEmpty else {
    throw CosineSimilarityError.emptyVectors
}

guard vectorA.count == vectorB.count else {
    throw CosineSimilarityError.unequalVectorLengths
}

Then, I calculate the dot product:

let count = vDSP_Length(vectorA.count)
var dotProduct: Float = 0.0
vDSP_dotpr(vectorA, 1, vectorB, 1, &dotProduct, count)

vDSP_Length converts the count to the correct type for Accelerate andvDSP_dotpr calculates the dot product.

For the parameters:

  • vectorA, vectorB: Input vectors
  • 1: Stride (distance between elements)
  • &dotProduct: Where to store the result
  • count: Number of elements to process

Remember, the & here means I am passing the memory address of dotProduct instead of the value itself. The Accelerate framework is built on C APIs and they need to modify the variables directly in memory. It is more efficient because it avoids creating unnecessary copies of data.

In vDSP_dotpr(vectorA, 1, vectorB, 1, &dotProduct, count), I first create dotProduct as a Float with a value of 0.0, then pass its address to vDSP_dotpr. The function knows exactly where to write its calculation results. This is important for performance, especially when dealing with large vectors in RAG apps.

Then, I calculate the magnitude of each vector:

var magnitudeA: Float = 0.0
var magnitudeB: Float = 0.0

vDSP_svesq(vectorA, 1, &magnitudeA, count)
vDSP_svesq(vectorB, 1, &magnitudeB, count)

magnitudeA = sqrt(magnitudeA)
magnitudeB = sqrt(magnitudeB)

vDSP_svesq squares and sums vector elements where the result is the squared magnitude, and then I take the square root of magnitudes.

Again, some validation to check zero vector:

guard magnitudeA > Float.ulpOfOne && magnitudeB > Float.ulpOfOne else {
    throw CosineSimilarityError.zeroVectorFound
}

Float.ulpOfOne is the smallest meaningful difference from 1.0. This check prevents division by zero and more precise than checking against 0.0, as I am not just catching exact zeros, but also catching vectors with magnitudes so small that they would cause numerical instability in the cosine similarity calculation. This helps prevent weird results that could come from dividing by numbers that are essentially zero, even if they are not exactly zero!

And then, I finally do the main calculation:

let similarity = dotProduct / (magnitudeA * magnitudeB)

I calculate final similarity using the formula:

  • cos(θ) = (A·B) / (||A|| × ||B||)

And some house keeping with result validation:

guard similarity.isFinite else {
    throw CosineSimilarityError.calculationError
}

return min(max(similarity, -1), 1)

This checks for numerical stability (isFinite) and clamps result to [-1, 1] range for floating-point precision.

And a bonus handy batch method:

public static func calculateBatch(_ vector: [Float], _ vectors: [[Float]]) throws -> [Float] {
   try vectors.map { try calculate(vector, $0) }
}

This allows comparison of one vector against many and uses Swift's map for a sweet implementation of it.

Using it in a RAG System

Here is an example of how to use this CosineSimilarity class in a RAG system:

// Create a simple structure to represent documents
struct Document {
    let id: String
    let content: String
    let embedding: [Float]  // Vector representation
}

final class RAGSystem {
    // Store the documents
    private let documents: [Document]
    
    init(documents: [Document]) {
        self.documents = documents
    }
    
    // Find the most relevant documents for a query
    func findSimilarDocuments(queryEmbedding: [Float], limit: Int = 3) throws -> [(Document, Float)] {
        // Calculate similarities between query and all documents
        let similarities = try CosineSimilarity.calculateBatch(queryEmbedding, documents.map { $0.embedding })
        
        // Zip documents with their similarity scores
        let documentsWithScores = zip(documents, similarities)
        
        // Sort by similarity (highest first) and take top results
        return Array(documentsWithScores.sorted { $0.1 > $1.1 }.prefix(limit))
    }
}

The Document structure is a way to package the text content of each document with its numerical representation (embedding). For example:

let embedding: [Float] = [0.8, 0.2, 0.0]  // A lot of doggo related content

These are numerical representations of text where each number in the array represents how much of a certain topic/concept is present. A rought example is that [0.8, 0.2, 0.0] might mean that 80% is about doggo, 20% about pets in general and 0% about cars.

And, then when I get the query, I represent it with another embedding like:

let queryEmbedding: [Float] = [0.9, 0.1, 0.0]  // Query about doggo

This is the numerical representation of what the user is asking about when user asks "Tell me about doggo", I convert it to numbers.

Now I can compare this mathematically with the document embeddings using the findSimilarDocuments method and get the zipped documents and its similarities as an array.

Here is a random doggo related example because why not:

// I ask: "Tell me about training doggos"
let userQuery = "Tell me about training doggos"

// 1. Convert query to embedding (in real app, you would most likely use an API)
let queryEmbedding: [Float] = [0.9, 0.1, 0.0]  // High focus on doggos

// 2. The document collection
let documents = [
    Document(
        id: "1",
        content: "How to train your puppy: Begin with basic commands...",
        embedding: [0.8, 0.2, 0.0]  // Very doggo-related
    ),
    Document(
        id: "2",
        content: "The best car maintenance tips...",
        embedding: [0.0, 0.0, 0.9]  // Car-related
    )
]

let ragSystem = RAGSystem(documents: documents)

// 3. Find similar documents
do {
    let results = try ragSystem.findSimilarDocuments(queryEmbedding: queryEmbedding)
    // Will return the puppy training document first because its embedding
    // is most similar to the query embedding!
    for (document, similarity) in results {
        print("Document: \(document.content)")
        print("Similarity Score: \(similarity)")
        print("---")
    }

} catch {
    print("Error: \(error)")
}

And here is the output:

Document: How to train your puppy: Begin with basic commands...
Similarity Score: 0.9909923
---
Document: The best car maintenance tips...
Similarity Score: 0.0
---

Moving Forward

Going back to dot product and starting over is fun, and I have barely scratched the surface. This is just one minute piece of a RAG system. You will also need to:

  • Generate embeddings for the documents
  • Store and manage the vector database
  • Implement better and smarter retrieval process

The implementation provides a solid foundation for building upon it. The next blog posts will cover them in detail.

If you have any questions or want to share your experiments, reach out on Twitter @rudrankriyam!

Happy cosining!