Image-to-Image Search with ArcFaceLoss#
Using image queries to search for visually similar images is a very popular use case. However, pre-trained models do not deliver the best results. Models are trained on general data that lack knowledge related to your specific task. Here’s where Finetuner comes in! It enables you to easily add task-specific knowledge to a model.
Where another guide showed off fine-tuning with
this guide will perform fine-tuning on a dataset with fewer classes, more documents per class and with training data that contains examples from every class in the evaluation data. To improve our performance in this case, we will use
ArcFaceLoss as our loss function this time.
Note, please switch to a GPU/TPU Runtime or this will be extremely slow!
!pip install 'finetuner[full]'
We will fine-tune ResNet50 on the Stanford Cars Dataset. This dataset consists of 196 classes across 16184 documents in total. Each class represents a single model of car, and consists of roughly 80 pictures of that model of car.
In order to move documents in the same class (images of the same model of car) closer together and move documents of different classes apart, we use the
ArcFaceLoss function. For more information on how this loss function works, as well as when to use it over
TripletmarginLoss, see Advanced Losses and Optimizers
After fine-tuning, documents from each class should have similar embeddings, distinct from documents of other classes, meaning that embedding two images of the same model of car will result in similar output vectors.
Our journey starts locally. We have to prepare the data and push it to the Jina AI Cloud and Finetuner will be able to get the dataset by its name. For this example,
we’ve already prepared the data, and we’ll provide Finetuner with just the names of training, query and index dataset (e.g.
You don’t have to push your data to the Jina AI Cloud before fine-tuning. Instead of a name, you can provide a
DocumentArray and Finetuner will do upload your data directly.
Important: If your documents refer to locally stored images, please call
doc.load_uri_to_blob() before starting Finetuner to reduce network transmission and speed up training.
import finetuner from finetuner import DocumentArray, Document finetuner.login(force=True)
train_data = DocumentArray.pull('finetuner/stanford-cars-train', show_progress=True) query_data = DocumentArray.pull('finetuner/stanford-cars-query', show_progress=True) index_data = DocumentArray.pull('finetuner/stanford-cars-index', show_progress=True) train_data.summary()
Now let’s see which backbone models we can use. You can see all the available models by calling
For this example, we’re gonna go with
resnet-base, a model that has been trained on the ImageNet classification task. In the next step, Finetuner will adapt this model, turning it into an embedding model instead.
Now that we have selected our model and loaded the training and evaluation datasets as
DocumentArrays, we can start our fine-tuning run.
from finetuner.callback import EvaluationCallback run = finetuner.fit( model='resnet-base', train_data='finetuner/stanford-cars-train', batch_size=128, epochs=5, learning_rate=1e-3, loss='ArcFaceLoss', device='cuda', sampler='random', callbacks=[ EvaluationCallback( query_data='finetuner/stanford-cars-query', index_data='finetuner/stanford-cars-index', ) ], )
Let’s understand what this piece of code does:
We select a
We also set
description, which are optional, but strongly recommended so that you can access and retain information about your run.
We specify the training data (
ArcFaceLossas our loss function.
finetuner.callback.EvaluationCallbackfor evaluation and specify the query and index datasets for it.
finetuner/stanford-cars-indexare two subsamples of the Stanford cars dataset that have no overlap with each other or our training data.
We set the number of training epochs (
epochs) and the learning rate (
Now that we’ve created a run, we can see its status. You can monitor the state of the run with
run.status(), and use
run.stream_logs() to see the logs.
# note, the fine-tuning might takes 30~ minutes for entry in run.stream_logs(): print(entry)
Since some runs might take up to several hours, it’s important to know how to reconnect to Finetuner and retrieve your runs.
import finetuner finetuner.login() run = finetuner.get_run(run.name)
You can continue monitoring the runs by checking the status -
finetuner.run.Run.status() or the logs -
Currently, we don’t have a user-friendly way to get evaluation metrics from the
finetuner.callback.EvaluationCallback we initialized previously.
What you can do for now is to call
run.logs() after the end of the run and see the evaluation results:
Training [5/5] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 48/48 0:00:00 0:00:12 • loss: 13.986 INFO Done ✨ __main__.py:195 DEBUG Finetuning took 0 days, 0 hours 3 minutes and 48 seconds __main__.py:197 INFO Metric: 'resnet_base_precision_at_k' before fine-tuning: 0.11575 after fine-tuning: __main__.py:210 0.53425 INFO Metric: 'resnet_base_recall_at_k' before fine-tuning: 0.05745 after fine-tuning: __main__.py:210 0.27113 INFO Metric: 'resnet_base_f1_score_at_k' before fine-tuning: 0.07631 after fine-tuning: __main__.py:210 0.35788 INFO Metric: 'resnet_base_hit_at_k' before fine-tuning: 0.82900 after fine-tuning: 0.94100 __main__.py:210 INFO Metric: 'resnet_base_average_precision' before fine-tuning: 0.52305 after fine-tuning: __main__.py:210 0.79779 INFO Metric: 'resnet_base_reciprocal_rank' before fine-tuning: 0.64909 after fine-tuning: __main__.py:210 0.89224 INFO Metric: 'resnet_base_dcg_at_k' before fine-tuning: 1.30710 after fine-tuning: 4.52143 __main__.py:210 INFO Building the artifact ... __main__.py:215 INFO Pushing artifact to Jina AI Cloud ... __main__.py:241 [12:19:53] INFO Artifact pushed under ID '63f8a9089c6406e19244771d' __main__.py:243 DEBUG Artifact size is 83.580 MB __main__.py:245 INFO Finished 🚀 __main__.py:246
After the run has finished successfully, you can download the tuned model on your local machine:
artifact = run.save_artifact('resnet-model')
Now you saved the
artifact into your host machine,
let’s use the fine-tuned model to encode a new
Inference with ONNX
In case you set
to_onnx=True when calling
model = finetuner.get_model(artifact, is_onnx=True)
query = DocumentArray([query_data]) model = finetuner.get_model(artifact=artifact, device='cuda') finetuner.encode(model=model, data=query) finetuner.encode(model=model, data=index_data) assert query.embeddings.shape == (1, 2048)
And finally, you can use the embedded
query to find top-k visually related images within
index_data as follows:
query.match(index_data, limit=10, metric='cosine')
Before and after#
We can directly compare the results of our fine-tuned model with its zero-shot counterpart to get a better idea of how fine-tuning affects the results of a search. Each class of the Stanford cars dataset contains images for a single model of car. Therefore, we can define a ‘good’ search result as an image of a car that is the same model as the car in the query image, and not necessarily images of cars that are taken at a similar angle, or are the same colour.
The example below shows exactly this: