When searching the whole codebase, it becomes important to find the relevant files to provide as context for inference. If you work on an app that helps the user chat with an external document, you retrieve similar context to add to the prompt.
In this blog post, I will explore how to implement cosine similarity using Apple's Accelerate framework. It is specific for developers who want to build Retrieval-Augmented Generation (RAG) systems in their iOS/macOS apps. I will try to break everything down into simple explanations; time to go back to college!
What is Cosine Similarity?
The first time I heard the word cosine was in high school, and the last time was during my Mathematics bachelors degree. Looks like I am back again to understand what cosine similarity is and why it is useful.
Cosine similarity is a measure that helps determine how similar two vectors (lists of numbers) are by calculating the cosine of the angle between them. The result ranges from -1 to 1, where:
- 1 means the vectors are identical
- 0 means they are perpendicular (completely different)
- -1 means they are opposite
Case 1: Perfect Similarity (Result = 1)
This happens when vectors point in exactly the same direction. They do not need to be identical in magnitude, just in direction.
Case 2: Complete Dissimilarity (Result = 0)
This occurs when vectors are perpendicular (at 90 degrees to each other), meaning they have no similarity in direction.
Case 3: Perfect Dissimilarity (Result = -1)
This happens when vectors point in exactly opposite directions.
Real-World Example in RAG Context
Let's look at a practical example using simple document embeddings:
In RAG systems, we use cosine similarity to find the most relevant documents or pieces of information by comparing their vector embeddings.
TL;DR: of Cosine Similarity
Here is the formula for it:
And the step-by-step breakdown:
- Multiply corresponding numbers and add them up (dot product)
- Calculate the length of first vector (magnitude A)
- Calculate the length of second vector (magnitude B)
- Divide dot product by the product of the lengths
The methods that you will need from Accelerate
:
Now that we understand what we are calculating, let us implement it using Accelerate
!
Implementating Cosine Similarity
I first define four specific error cases:
emptyVectors
: When either input vector is emptyunequalVectorLengths
: When vectors have different lengthszeroVectorFound
: When either vector has zero magnitudecalculationError
: For numerical instability issues
Then, I define a class and add the main calculate
method and a handy calculateBatch
one as well:
I do some input validation that neither vector is empty and both vectors have the same length:
Then, I calculate the dot product:
vDSP_Length
converts the count to the correct type for Accelerate
andvDSP_dotpr
calculates the dot product.
For the parameters:
vectorA, vectorB
: Input vectors1
: Stride (distance between elements)&dotProduct
: Where to store the resultcount
: Number of elements to process
Remember, the &
here means I am passing the memory address of dotProduct
instead of the value itself. The Accelerate
framework is built on C APIs and they need to modify the variables directly in memory. It is more efficient because it avoids creating unnecessary copies of data.
In vDSP_dotpr(vectorA, 1, vectorB, 1, &dotProduct, count)
, I first create dotProduct
as a Float with a value of 0.0, then pass its address to vDSP_dotpr
. The function knows exactly where to write its calculation results. This is important for performance, especially when dealing with large vectors in RAG apps.
Then, I calculate the magnitude of each vector:
vDSP_svesq
squares and sums vector elements where the result is the squared magnitude, and then I take the square root of magnitudes.
Again, some validation to check zero vector:
Float.ulpOfOne
is the smallest meaningful difference from 1.0. This check prevents division by zero and more precise than checking against 0.0, as I am not just catching exact zeros, but also catching vectors with magnitudes so small that they would cause numerical instability in the cosine similarity calculation. This helps prevent weird results that could come from dividing by numbers that are essentially zero, even if they are not exactly zero!
And then, I finally do the main calculation:
I calculate final similarity using the formula:
- cos(θ) = (A·B) / (||A|| × ||B||)
And some house keeping with result validation:
This checks for numerical stability (isFinite) and clamps result to [-1, 1] range for floating-point precision.
And a bonus handy batch method:
This allows comparison of one vector against many and uses Swift's map
for a sweet implementation of it.
Using it in a RAG System
Here is an example of how to use this CosineSimilarity
class in a RAG system:
The Document
structure is a way to package the text content of each document with its numerical representation (embedding). For example:
These are numerical representations of text where each number in the array represents how much of a certain topic/concept is present. A rought example is that [0.8, 0.2, 0.0]
might mean that 80% is about doggo, 20% about pets in general and 0% about cars.
And, then when I get the query, I represent it with another embedding like:
This is the numerical representation of what the user is asking about when user asks "Tell me about doggo", I convert it to numbers.
Now I can compare this mathematically with the document embeddings using the findSimilarDocuments
method and get the zipped documents and its similarities as an array.
Here is a random doggo related example because why not:
And here is the output:

Moving Forward
Going back to dot product and starting over is fun, and I have barely scratched the surface. This is just one minute piece of a RAG system. You will also need to:
- Generate embeddings for the documents
- Store and manage the vector database
- Implement better and smarter retrieval process
The implementation provides a solid foundation for building upon it. The next blog posts will cover them in detail.
If you have any questions or want to share your experiments, reach out on Twitter @rudrankriyam!
Happy cosining!