Image-to-Image Search with ArcFaceLoss#

Open In Colab

Using image queries to search for visually similar images is a very popular use case. However, pre-trained models do not deliver the best results. Models are trained on general data that lack knowledge related to your specific task. Here’s where Finetuner comes in! It enables you to easily add task-specific knowledge to a model.

Where another guide showed off fine-tuning with TripletMarginLoss, this guide will perform fine-tuning on a dataset with fewer classes, more documents per class and with training data that contains examples from every class in the evaluation data. To improve our performance in this case, we will use ArcFaceLoss as our loss function this time.

Note, please switch to a GPU/TPU Runtime or this will be extremely slow!


!pip install 'finetuner[full]'


We will fine-tune ResNet50 on the Stanford Cars Dataset. This dataset consists of 196 classes across 16184 documents in total. Each class represents a single model of car, and consists of roughly 80 pictures of that model of car.

In order to move documents in the same class (images of the same model of car) closer together and move documents of different classes apart, we use the ArcFaceLoss function. For more information on how this loss function works, as well as when to use it over TripletmarginLoss, see Advanced Losses and Optimizers

After fine-tuning, documents from each class should have similar embeddings, distinct from documents of other classes, meaning that embedding two images of the same model of car will result in similar output vectors.


Our journey starts locally. We have to prepare the data and push it to the Jina AI Cloud and Finetuner will be able to get the dataset by its name. For this example, we’ve already prepared the data, and we’ll provide Finetuner with just the names of training, query and index dataset (e.g. stanford-cars-train).


You don’t have to push your data to the Jina AI Cloud before fine-tuning. Instead of a name, you can provide a DocumentArray and Finetuner will do upload your data directly. Important: If your documents refer to locally stored images, please call doc.load_uri_to_blob() before starting Finetuner to reduce network transmission and speed up training.

import finetuner
from finetuner import DocumentArray, Document

train_data = DocumentArray.pull('finetuner/stanford-cars-train', show_progress=True)
query_data = DocumentArray.pull('finetuner/stanford-cars-query', show_progress=True)
index_data = DocumentArray.pull('finetuner/stanford-cars-index', show_progress=True)


Backbone model#

Now let’s see which backbone models we can use. You can see all the available models by calling finetuner.describe_models().

For this example, we’re gonna go with resnet-base, a model that has been trained on the ImageNet classification task. In the next step, Finetuner will adapt this model, turning it into an embedding model instead.


Now that we have selected our model and loaded the training and evaluation datasets as DocumentArrays, we can start our fine-tuning run.

from finetuner.callback import EvaluationCallback

run =

Let’s understand what this piece of code does:

  • We select a model: resnet-base.

  • We also set run_name and description, which are optional, but strongly recommended so that you can access and retain information about your run.

  • We specify the training data (train_data).

  • We set ArcFaceLoss as our loss function.

  • We use finetuner.callback.EvaluationCallback for evaluation and specify the query and index datasets for it. finetuner/stanford-cars-query and finetuner/stanford-cars-index are two subsamples of the Stanford cars dataset that have no overlap with each other or our training data.

  • We set the number of training epochs (epochs) and the learning rate (learning_rate).


Now that we’ve created a run, we can see its status. You can monitor the state of the run with run.status(), and use run.logs() or run.stream_logs() to see the logs.

# note, the fine-tuning might takes 30~ minutes
for entry in run.stream_logs():

Since some runs might take up to several hours, it’s important to know how to reconnect to Finetuner and retrieve your runs.

import finetuner

run = finetuner.get_run(

You can continue monitoring the runs by checking the status - or the logs -


Currently, we don’t have a user-friendly way to get evaluation metrics from the finetuner.callback.EvaluationCallback we initialized previously. What you can do for now is to call run.logs() after the end of the run and see the evaluation results:

Training [5/5] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 48/48 0:00:00 0:00:12  loss: 13.986
INFO     Done                                                                     
DEBUG    Finetuning took 0 days, 0 hours 3 minutes and 48 seconds                   
INFO     Metric: 'resnet_base_precision_at_k' before fine-tuning:  0.11575 after fine-tuning:
INFO     Metric: 'resnet_base_recall_at_k' before fine-tuning:  0.05745 after fine-tuning:
INFO     Metric: 'resnet_base_f1_score_at_k' before fine-tuning:  0.07631 after fine-tuning:
INFO     Metric: 'resnet_base_hit_at_k' before fine-tuning:  0.82900 after fine-tuning: 0.94100
INFO     Metric: 'resnet_base_average_precision' before fine-tuning:  0.52305 after fine-tuning:
INFO     Metric: 'resnet_base_reciprocal_rank' before fine-tuning:  0.64909 after fine-tuning:
INFO     Metric: 'resnet_base_dcg_at_k' before fine-tuning:  1.30710 after fine-tuning: 4.52143
INFO     Building the artifact ...                                                  
INFO     Pushing artifact to Jina AI Cloud ...                                      
[12:19:53] INFO     Artifact pushed under ID '63f8a9089c6406e19244771d'                        
DEBUG    Artifact size is 83.580 MB                                                 
INFO     Finished 🚀                                                                


After the run has finished successfully, you can download the tuned model on your local machine:

artifact = run.save_artifact('resnet-model')


Now you saved the artifact into your host machine, let’s use the fine-tuned model to encode a new Document:

Inference with ONNX

In case you set to_onnx=True when calling function, please use model = finetuner.get_model(artifact, is_onnx=True)

query = DocumentArray([query_data[0]])

model = finetuner.get_model(artifact=artifact, device='cuda')

finetuner.encode(model=model, data=query)
finetuner.encode(model=model, data=index_data)

assert query.embeddings.shape == (1, 2048)

And finally, you can use the embedded query to find top-k visually related images within index_data as follows:

query.match(index_data, limit=10, metric='cosine')

Before and after#

We can directly compare the results of our fine-tuned model with its zero-shot counterpart to get a better idea of how fine-tuning affects the results of a search. Each class of the Stanford cars dataset contains images for a single model of car. Therefore, we can define a ‘good’ search result as an image of a car that is the same model as the car in the query image, and not necessarily images of cars that are taken at a similar angle, or are the same colour.
The example below shows exactly this: