finetuner package#

Subpackages#

Submodules#

Module contents#

finetuner.login(force=False, interactive=None)[source]#

Login to Jina AI Cloud to use cloud-based fine-tuning. Thereby, an authentication token is generated which can be read with the get_token() function.

Parameters:
  • force (bool) – If set to true, an existing token will be overwritten. Otherwise, you will not login again, if a valid token already exists.

  • interactive (Optional[bool]) – Interactive mode should be set in Jupyter environments.

finetuner.list_callbacks()[source]#

List available callbacks.

Return type:

Dict[str, ~CallbackStubType]

finetuner.list_models()[source]#

List available models.

Return type:

List[str]

finetuner.list_model_options()[source]#

List available options per model.

Return type:

Dict[str, List[Dict[str, Any]]]

finetuner.describe_models(task=None)[source]#

Print model information, such as name, task, output dimension, architecture and description as a table.

Parameters:

task (Optional[str]) – The task for the backbone model, one of text-to-text, text-to-image, image-to-image. If not provided, will print all backbone models.

Return type:

None

finetuner.fit(model, train_data, eval_data=None, val_split=0.0, run_name=None, description=None, experiment_name=None, model_options=None, loss='TripletMarginLoss', miner=None, miner_options=None, optimizer='Adam', optimizer_options=None, learning_rate=None, epochs=5, batch_size=64, callbacks=None, scheduler_step='batch', freeze=False, output_dim=None, device='cuda', num_workers=4, to_onnx=False, csv_options=None, public=False, num_items_per_class=4)#

Create a Finetuner Run, calling this function will submit a fine-tuning job to the Jina AI Cloud.

Parameters:
  • model (str) – The name of model to be fine-tuned. Run finetuner.list_models() or finetuner.describe_models() to see the available model names.

  • train_data (Union[str, TextIO, DocumentArray]) – Either a DocumentArray for training data, a name of the DocumentArray that is pushed on Jina AI Cloud or a path to a CSV file.

  • eval_data (Union[str, TextIO, DocumentArray, None]) – Either a DocumentArray for evaluation data, a name of the DocumentArray that is pushed on Jina AI Cloud or a path to a CSV file.

  • val_split (float) – Determines which portion of the train_data is held out for calculating a validation loss. If it is set to 0, or an eval_data parameter is provided, no data is held out from the training data. Instead, the eval_data is used to calculate the validation loss if it is provided.

  • run_name (Optional[str]) – Name of the run.

  • description (Optional[str]) – Run description.

  • experiment_name (Optional[str]) – Name of the experiment.

  • model_options (Optional[Dict[str, Any]]) – Additional arguments to pass to the model construction. These are model specific options and are different depending on the model you choose. Run finetuner.list_model_options() to see available options for every model.

  • loss (str) – Name of the loss function used for fine-tuning. Default is TripletMarginLoss. Options: CosFaceLoss, NTXLoss, AngularLoss, ArcFaceLoss, BaseMetricLossFunction, MultipleLosses, CentroidTripletLoss, CircleLoss, ContrastiveLoss, CrossBatchMemory, FastAPLoss, GenericPairLoss, IntraPairVarianceLoss, LargeMarginSoftmaxLoss, GeneralizedLiftedStructureLoss, LiftedStructureLoss, MarginLoss, EmbeddingRegularizerMixin, WeightRegularizerMixin, MultiSimilarityLoss, NPairsLoss, NCALoss, NormalizedSoftmaxLoss, ProxyAnchorLoss, ProxyNCALoss, SignalToNoiseRatioContrastiveLoss, SoftTripleLoss, SphereFaceLoss, SupConLoss, TripletMarginLoss, TupletMarginLoss, VICRegLoss, CLIPLoss.

  • miner (Optional[str]) – Name of the miner to create tuple indices for the loss function. Options: AngularMiner, BaseMiner, BaseSubsetBatchMiner, BaseTupleMiner, BatchEasyHardMiner, BatchHardMiner, DistanceWeightedMiner, HDCMiner, EmbeddingsAlreadyPackagedAsTriplets, MaximumLossMiner, PairMarginMiner, MultiSimilarityMiner, TripletMarginMiner, UniformHistogramMiner.

  • miner_options (Optional[Dict[str, Any]]) – Additional parameters to pass to the miner construction. The set of applicable parameters is specific to the miner you choose. Details on the parameters can be found in the PyTorch Metric Learning documentation

  • optimizer (str) – Name of the optimizer used for fine-tuning. Options: Adadelta, Adagrad, Adam, AdamW, SparseAdam, Adamax, ASGD, LBFGS, NAdam, RAdam, RMSprop, Rprop, SGD.

  • optimizer_options (Optional[Dict[str, Any]]) – Additional parameters to pass to the optimizer construction. The set of applicable parameters is specific to the optimizer you choose. Details on the parameters can be found in the PyTorch documentation

  • learning_rate (Optional[float]) – learning rate for the optimizer.

  • epochs (int) – Number of epochs for fine-tuning.

  • batch_size (int) – Number of items to include in a batch.

  • callbacks (Optional[List[~CallbackStubType]]) – List of callback stub objects. subpackage for available options, or run finetuner.list_callbacks().

  • scheduler_step (str) – At which interval should the learning rate scheduler’s step function be called. Valid options are batch and epoch.

  • freeze (bool) – If set to True, will freeze all layers except the last one.

  • output_dim (Optional[int]) – The expected output dimension as int. If set, will attach a projection head.

  • device (str) – Whether to use the CPU, if set to cuda, a Nvidia GPU will be used. otherwise use cpu to run a cpu job.

  • num_workers (int) – Number of CPU workers. If cpu: False this is the number of workers used by the dataloader.

  • to_onnx (bool) – Set this parameter as True to convert the model to an onnx model. Please note that not all models support this. If this parameter is set, please pass is_onnx when making inference, e.g., when calling the get_model function.

  • csv_options (Optional[CSVOptions]) – A CSVOptions object containing options used for reading in training and evaluation data from a CSV file, if they are provided as such.

  • public (bool) – A boolean value indicates if the artifact is public. It should be set to True if you would like to share your fine-tuned model with others.

  • num_items_per_class (int) – How many items per class (unique labels) to include in a batch. For example, if batch_size is 20, and num_items_per_class is 4, the batch will consist of 4 items for each of the 5 classes. Batch size must be divisible by num_items_per_class.

Note

Unless necessary, please stick with device=”cuda”, cpu training could be extremely slow and inefficient.

Return type:

Run

finetuner.create_run(model, train_data, eval_data=None, val_split=0.0, run_name=None, description=None, experiment_name=None, model_options=None, loss='TripletMarginLoss', miner=None, miner_options=None, optimizer='Adam', optimizer_options=None, learning_rate=None, epochs=5, batch_size=64, callbacks=None, scheduler_step='batch', freeze=False, output_dim=None, device='cuda', num_workers=4, to_onnx=False, csv_options=None, public=False, num_items_per_class=4)#

Create a Finetuner Run, calling this function will submit a fine-tuning job to the Jina AI Cloud.

Parameters:
  • model (str) – The name of model to be fine-tuned. Run finetuner.list_models() or finetuner.describe_models() to see the available model names.

  • train_data (Union[str, TextIO, DocumentArray]) – Either a DocumentArray for training data, a name of the DocumentArray that is pushed on Jina AI Cloud or a path to a CSV file.

  • eval_data (Union[str, TextIO, DocumentArray, None]) – Either a DocumentArray for evaluation data, a name of the DocumentArray that is pushed on Jina AI Cloud or a path to a CSV file.

  • val_split (float) – Determines which portion of the train_data is held out for calculating a validation loss. If it is set to 0, or an eval_data parameter is provided, no data is held out from the training data. Instead, the eval_data is used to calculate the validation loss if it is provided.

  • run_name (Optional[str]) – Name of the run.

  • description (Optional[str]) – Run description.

  • experiment_name (Optional[str]) – Name of the experiment.

  • model_options (Optional[Dict[str, Any]]) – Additional arguments to pass to the model construction. These are model specific options and are different depending on the model you choose. Run finetuner.list_model_options() to see available options for every model.

  • loss (str) – Name of the loss function used for fine-tuning. Default is TripletMarginLoss. Options: CosFaceLoss, NTXLoss, AngularLoss, ArcFaceLoss, BaseMetricLossFunction, MultipleLosses, CentroidTripletLoss, CircleLoss, ContrastiveLoss, CrossBatchMemory, FastAPLoss, GenericPairLoss, IntraPairVarianceLoss, LargeMarginSoftmaxLoss, GeneralizedLiftedStructureLoss, LiftedStructureLoss, MarginLoss, EmbeddingRegularizerMixin, WeightRegularizerMixin, MultiSimilarityLoss, NPairsLoss, NCALoss, NormalizedSoftmaxLoss, ProxyAnchorLoss, ProxyNCALoss, SignalToNoiseRatioContrastiveLoss, SoftTripleLoss, SphereFaceLoss, SupConLoss, TripletMarginLoss, TupletMarginLoss, VICRegLoss, CLIPLoss.

  • miner (Optional[str]) – Name of the miner to create tuple indices for the loss function. Options: AngularMiner, BaseMiner, BaseSubsetBatchMiner, BaseTupleMiner, BatchEasyHardMiner, BatchHardMiner, DistanceWeightedMiner, HDCMiner, EmbeddingsAlreadyPackagedAsTriplets, MaximumLossMiner, PairMarginMiner, MultiSimilarityMiner, TripletMarginMiner, UniformHistogramMiner.

  • miner_options (Optional[Dict[str, Any]]) –

    Additional parameters to pass to the miner construction. The set of applicable parameters is specific to the miner you choose. Details on the parameters can be found in the PyTorch Metric Learning documentation

  • optimizer (str) – Name of the optimizer used for fine-tuning. Options: Adadelta, Adagrad, Adam, AdamW, SparseAdam, Adamax, ASGD, LBFGS, NAdam, RAdam, RMSprop, Rprop, SGD.

  • optimizer_options (Optional[Dict[str, Any]]) –

    Additional parameters to pass to the optimizer construction. The set of applicable parameters is specific to the optimizer you choose. Details on the parameters can be found in the PyTorch documentation

  • learning_rate (Optional[float]) – learning rate for the optimizer.

  • epochs (int) – Number of epochs for fine-tuning.

  • batch_size (int) – Number of items to include in a batch.

  • callbacks (Optional[List[~CallbackStubType]]) – List of callback stub objects. subpackage for available options, or run finetuner.list_callbacks().

  • scheduler_step (str) – At which interval should the learning rate scheduler’s step function be called. Valid options are batch and epoch.

  • freeze (bool) – If set to True, will freeze all layers except the last one.

  • output_dim (Optional[int]) – The expected output dimension as int. If set, will attach a projection head.

  • device (str) – Whether to use the CPU, if set to cuda, a Nvidia GPU will be used. otherwise use cpu to run a cpu job.

  • num_workers (int) – Number of CPU workers. If cpu: False this is the number of workers used by the dataloader.

  • to_onnx (bool) – Set this parameter as True to convert the model to an onnx model. Please note that not all models support this. If this parameter is set, please pass is_onnx when making inference, e.g., when calling the get_model function.

  • csv_options (Optional[CSVOptions]) – A CSVOptions object containing options used for reading in training and evaluation data from a CSV file, if they are provided as such.

  • public (bool) – A boolean value indicates if the artifact is public. It should be set to True if you would like to share your fine-tuned model with others.

  • num_items_per_class (int) – How many items per class (unique labels) to include in a batch. For example, if batch_size is 20, and num_items_per_class is 4, the batch will consist of 4 items for each of the 5 classes. Batch size must be divisible by num_items_per_class.

Note

Unless necessary, please stick with device=”cuda”, cpu training could be extremely slow and inefficient.

Return type:

Run

finetuner.get_run(run_name, experiment_name=None)[source]#

Get a Run by its name and (optional) Experiment name.

If an experiment name is not specified, we’ll look for the run in the default experiment.

Parameters:
  • run_name (str) – Name of the Run.

  • experiment_name (Optional[str]) – Optional name of the Experiment.

Return type:

Run

Returns:

A Run object.

finetuner.list_runs(experiment_name=None, page=1, size=50)[source]#

List all created Run inside a given Experiment.

If no Experiment is specified, list Run for all available

Experiment.

Parameters:
  • experiment_name (Optional[str]) – The name of the Experiment.

  • page (int) – The page index.

  • size (int) – Number of Run to retrieve.

Return type:

List[Run]

Returns:

List of all Run.

..note:: page and size works together. For example, page 1 size 50 gives

the 50 runs in the first page. To get 50-100, set page as 2.

..note:: The maximum number for size per page is 100.

finetuner.delete_run(run_name, experiment_name=None)[source]#
Delete a Run given a run_name and

optional experiment_name.

If an experiment name is not specified, we’ll look for the run in the default experiment.

Parameters:
  • run_name (str) – Name of the run. View your runs with list_runs.

  • experiment_name (Optional[str]) – Optional name of the experiment.

Return type:

None

finetuner.delete_runs(experiment_name=None)[source]#

Delete all Run given an optional experiment_name.

If an experiment name is not specified, we’ll delete every run across all experiments.

Parameters:

experiment_name (Optional[str]) – Optional name of the experiment. View your experiment names with list_experiments().

Return type:

None

finetuner.create_experiment(name='default')[source]#

Create an Experiment.

Parameters:

name (str) – The name of the experiment. If not provided, the experiment is named as default.

Return type:

Experiment

Returns:

An Experiment object.

finetuner.get_experiment(name)[source]#

Get an Experiment given a name.

Parameters:

name (str) – Name of the experiment.

Return type:

Experiment

Returns:

An Experiment object.

finetuner.list_experiments(page=1, size=50)[source]#

List all Experiment.

Parameters:
  • page (int) – The page index.

  • size (int) – The number of experiments to retrieve.

Return type:

List[Experiment]

Returns:

A list of Experiment instance.

..note:: page and size works together. For example, page 1 size 50 gives

the 50 experiments in the first page. To get 50-100, set page as 2.

..note:: The maximum number for size per page is 100.

finetuner.delete_experiment(name)[source]#

Delete an Experiment given a name.

Parameters:

name (str) – Name of the experiment. View your experiment names with list_experiments().

Return type:

Experiment

Returns:

Deleted experiment.

finetuner.delete_experiments()[source]#

Delete all Experiment. :rtype: List[Experiment] :return: List of deleted experiments.

finetuner.get_token()[source]#

Get user token from the Jina AI Cloud, login() is required.

Return type:

str

Returns:

user token as string object.

finetuner.build_model(name, model_options=None, batch_size=32, select_model=None, device=None, is_onnx=False)[source]#

Builds a pre-trained model given a name.

Parameters:
  • name (str) – Refers to a pre-trained model, see https://finetuner.jina.ai/walkthrough/choose-backbone/ or use the finetuner.describe_models() function for a list of all supported models.

  • model_options (Optional[Dict[str, Any]]) – A dictionary of model specific options.

  • batch_size (int) – Incoming documents are fed to the graph in batches, both to speed-up inference and avoid memory errors. This argument controls the number of documents that will be put in each batch.

  • select_model (Optional[str]) – Finetuner run artifacts might contain multiple models. In such cases you can select which model to deploy using this argument. For CLIP fine-tuning, you can choose either clip-vision or clip-text.

  • device (Optional[str]) – Whether to use the CPU, if set to cuda, a Nvidia GPU will be used. otherwise use cpu to run a cpu job.

  • is_onnx (bool) – The model output format, either onnx or pt.

Return type:

InferenceEngine

Returns:

an instance of :class:’TorchInferenceEngine’ or ONNXINferenceEngine.

finetuner.get_model(artifact, token=None, batch_size=32, select_model=None, device=None, logging_level='WARNING', is_onnx=False)[source]#

Re-build the model based on the model inference session with ONNX.

Parameters:
  • artifact (str) – Specify a finetuner run artifact. Can be a path to a local directory, a path to a local zip file, or a Hubble artifact ID. Individual model artifacts (model sub-folders inside the run artifacts) can also be specified using this argument.

  • token (Optional[str]) – A Jina authentication token required for pulling artifacts from Hubble. If not provided, the Hubble client will try to find one either in a local cache folder or in the environment.

  • batch_size (int) – Incoming documents are fed to the graph in batches, both to speed-up inference and avoid memory errors. This argument controls the number of documents that will be put in each batch.

  • select_model (Optional[str]) – Finetuner run artifacts might contain multiple models. In such cases you can select which model to deploy using this argument. For CLIP fine-tuning, you can choose either clip-vision or clip-text.

  • device (Optional[str]) – Whether to use the CPU, if set to cuda, a Nvidia GPU will be used. otherwise use cpu to run a cpu job.

  • logging_level (str) – The executor logging level. See https://docs.python.org/3/library/logging.html#logging-levels for available options.

  • is_onnx (bool) – The model output format, either onnx or pt.

Return type:

InferenceEngine

Returns:

An instance of ONNXRuntimeInferenceEngine.

..Note::

please install finetuner[full] to include all the dependencies.

finetuner.encode(model, data, batch_size=32)[source]#
Preprocess, collate and encode the list or :class:`DocumentArray

with embeddings.

Parameters:
  • model (InferenceEngine) – The model to be used to encode DocumentArray. In this case an instance of ONNXRuntimeInferenceEngine or TorchInferenceEngine produced by finetuner.get_model()

  • data (Union[DocumentArray, List[str]]) – The DocumentArray object to be encoded.

  • batch_size (int) – Incoming documents are fed to the graph in batches, both to speed-up inference and avoid memory errors. This argument controls the number of documents that will be put in each batch.

Return type:

Union[DocumentArray, ForwardRef]

Returns:

DocumentArray filled with embeddings.

..Note::

please install “finetuner[full]” to include all the dependencies.