finetuner package#

Subpackages#

Submodules#

Module contents#

finetuner.login()[source]#
finetuner.connect()[source]#
finetuner.list_callbacks()[source]#

List available callbacks.

Return type

Dict[str, ~CallbackStubType]

finetuner.list_models()[source]#

List available models for training.

Return type

List[str]

finetuner.list_model_options()[source]#

List available options per model.

Return type

Dict[str, List[Dict[str, Any]]]

finetuner.describe_models()[source]#

Describe available models in a table.

Return type

None

finetuner.fit(model, train_data, eval_data=None, run_name=None, description=None, experiment_name=None, model_options=None, loss='TripletMarginLoss', miner=None, miner_options=None, optimizer='Adam', optimizer_options=None, learning_rate=None, epochs=5, batch_size=64, callbacks=None, scheduler_step='batch', freeze=False, output_dim=None, cpu=True, num_workers=4, to_onnx=False)#

Start a finetuner run!

Parameters
  • model (str) – The name of model to be fine-tuned. Run finetuner.list_models() or finetuner.describe_models() to see the available model names.

  • train_data (Union[str, DocumentArray]) – Either a DocumentArray for training data or a name of the DocumentArray that is pushed on Hubble.

  • eval_data (Union[str, DocumentArray, None]) – Either a DocumentArray for evaluation data or a name of the DocumentArray that is pushed on Hubble.

  • run_name (Optional[str]) – Name of the run.

  • description (Optional[str]) – Run description.

  • experiment_name (Optional[str]) – Name of the experiment.

  • model_options (Optional[Dict[str, Any]]) – Additional arguments to pass to the model construction. These are model specific options and are different depending on the model you choose. Run finetuner.list_model_options() to see available options for every model.

  • loss (str) – Name of the loss function used for fine-tuning. Default is TripletMarginLoss. Options: CosFaceLoss, NTXLoss, AngularLoss, ArcFaceLoss, BaseMetricLossFunction, MultipleLosses, CentroidTripletLoss, CircleLoss, ContrastiveLoss, CrossBatchMemory, FastAPLoss, GenericPairLoss, IntraPairVarianceLoss, LargeMarginSoftmaxLoss, GeneralizedLiftedStructureLoss, LiftedStructureLoss, MarginLoss, EmbeddingRegularizerMixin, WeightRegularizerMixin, MultiSimilarityLoss, NPairsLoss, NCALoss, NormalizedSoftmaxLoss, ProxyAnchorLoss, ProxyNCALoss, SignalToNoiseRatioContrastiveLoss, SoftTripleLoss, SphereFaceLoss, SupConLoss, TripletMarginLoss, TupletMarginLoss, VICRegLoss, CLIPLoss.

  • miner (Optional[str]) – Name of the miner to create tuple indices for the loss function. Options: AngularMiner, BaseMiner, BaseSubsetBatchMiner, BaseTupleMiner, BatchEasyHardMiner, BatchHardMiner, DistanceWeightedMiner, HDCMiner, EmbeddingsAlreadyPackagedAsTriplets, MaximumLossMiner, PairMarginMiner, MultiSimilarityMiner, TripletMarginMiner, UniformHistogramMiner.

  • miner_options (Optional[Dict[str, Any]]) – Additional parameters to pass to the miner construction. The set of applicable parameters is specific to the miner you choose. Details on the parameters can be found in the PyTorch Metric Learning documentation

  • optimizer (str) – Name of the optimizer used for fine-tuning. Options: Adadelta, Adagrad, Adam, AdamW, SparseAdam, Adamax, ASGD, LBFGS, NAdam, RAdam, RMSprop, Rprop, SGD.

  • optimizer_options (Optional[Dict[str, Any]]) – Additional parameters to pass to the optimizer construction. The set of applicable parameters is specific to the optimizer you choose. Details on the parameters can be found in the PyTorch documentation

  • learning_rate (Optional[float]) – learning rate for the optimizer.

  • epochs (int) – Number of epochs for fine-tuning.

  • batch_size (int) – Number of items to include in a batch.

  • callbacks (Optional[List[~CallbackStubType]]) – List of callback stub objects. subpackage for available options, or run finetuner.list_callbacks().

  • scheduler_step (str) – At which interval should the learning rate scheduler’s step function be called. Valid options are batch and epoch.

  • freeze (bool) – If set to True, will freeze all layers except the last one.

  • output_dim (Optional[int]) – The expected output dimension as int. If set, will attach a projection head.

  • cpu (bool) – Whether to use the CPU. If set to False a GPU will be used.

  • num_workers (int) – Number of CPU workers. If cpu: False this is the number of workers used by the dataloader.

  • to_onnx (bool) – If the model is an onnx model or not. If you call the fit function with to_onnx=True, please set this parameter as True.

Return type

Run

finetuner.create_run(model, train_data, eval_data=None, run_name=None, description=None, experiment_name=None, model_options=None, loss='TripletMarginLoss', miner=None, miner_options=None, optimizer='Adam', optimizer_options=None, learning_rate=None, epochs=5, batch_size=64, callbacks=None, scheduler_step='batch', freeze=False, output_dim=None, cpu=True, num_workers=4, to_onnx=False)#

Start a finetuner run!

Parameters
  • model (str) – The name of model to be fine-tuned. Run finetuner.list_models() or finetuner.describe_models() to see the available model names.

  • train_data (Union[str, DocumentArray]) – Either a DocumentArray for training data or a name of the DocumentArray that is pushed on Hubble.

  • eval_data (Union[str, DocumentArray, None]) – Either a DocumentArray for evaluation data or a name of the DocumentArray that is pushed on Hubble.

  • run_name (Optional[str]) – Name of the run.

  • description (Optional[str]) – Run description.

  • experiment_name (Optional[str]) – Name of the experiment.

  • model_options (Optional[Dict[str, Any]]) – Additional arguments to pass to the model construction. These are model specific options and are different depending on the model you choose. Run finetuner.list_model_options() to see available options for every model.

  • loss (str) – Name of the loss function used for fine-tuning. Default is TripletMarginLoss. Options: CosFaceLoss, NTXLoss, AngularLoss, ArcFaceLoss, BaseMetricLossFunction, MultipleLosses, CentroidTripletLoss, CircleLoss, ContrastiveLoss, CrossBatchMemory, FastAPLoss, GenericPairLoss, IntraPairVarianceLoss, LargeMarginSoftmaxLoss, GeneralizedLiftedStructureLoss, LiftedStructureLoss, MarginLoss, EmbeddingRegularizerMixin, WeightRegularizerMixin, MultiSimilarityLoss, NPairsLoss, NCALoss, NormalizedSoftmaxLoss, ProxyAnchorLoss, ProxyNCALoss, SignalToNoiseRatioContrastiveLoss, SoftTripleLoss, SphereFaceLoss, SupConLoss, TripletMarginLoss, TupletMarginLoss, VICRegLoss, CLIPLoss.

  • miner (Optional[str]) – Name of the miner to create tuple indices for the loss function. Options: AngularMiner, BaseMiner, BaseSubsetBatchMiner, BaseTupleMiner, BatchEasyHardMiner, BatchHardMiner, DistanceWeightedMiner, HDCMiner, EmbeddingsAlreadyPackagedAsTriplets, MaximumLossMiner, PairMarginMiner, MultiSimilarityMiner, TripletMarginMiner, UniformHistogramMiner.

  • miner_options (Optional[Dict[str, Any]]) –

    Additional parameters to pass to the miner construction. The set of applicable parameters is specific to the miner you choose. Details on the parameters can be found in the PyTorch Metric Learning documentation

  • optimizer (str) – Name of the optimizer used for fine-tuning. Options: Adadelta, Adagrad, Adam, AdamW, SparseAdam, Adamax, ASGD, LBFGS, NAdam, RAdam, RMSprop, Rprop, SGD.

  • optimizer_options (Optional[Dict[str, Any]]) –

    Additional parameters to pass to the optimizer construction. The set of applicable parameters is specific to the optimizer you choose. Details on the parameters can be found in the PyTorch documentation

  • learning_rate (Optional[float]) – learning rate for the optimizer.

  • epochs (int) – Number of epochs for fine-tuning.

  • batch_size (int) – Number of items to include in a batch.

  • callbacks (Optional[List[~CallbackStubType]]) – List of callback stub objects. subpackage for available options, or run finetuner.list_callbacks().

  • scheduler_step (str) – At which interval should the learning rate scheduler’s step function be called. Valid options are batch and epoch.

  • freeze (bool) – If set to True, will freeze all layers except the last one.

  • output_dim (Optional[int]) – The expected output dimension as int. If set, will attach a projection head.

  • cpu (bool) – Whether to use the CPU. If set to False a GPU will be used.

  • num_workers (int) – Number of CPU workers. If cpu: False this is the number of workers used by the dataloader.

  • to_onnx (bool) – If the model is an onnx model or not. If you call the fit function with to_onnx=True, please set this parameter as True.

Return type

Run

finetuner.get_run(run_name, experiment_name=None)[source]#

Get run by its name and (optional) experiment.

If an experiment name is not specified, we’ll look for the run in the default experiment.

Parameters
  • run_name (str) – Name of the run.

  • experiment_name (Optional[str]) – Optional name of the experiment.

Return type

Run

Returns

A Run object.

finetuner.list_runs(experiment_name=None)[source]#

List every run.

If an experiment name is not specified, we’ll list every run across all experiments.

Parameters

experiment_name (Optional[str]) – Optional name of the experiment.

Return type

List[Run]

Returns

A list of Run objects.

finetuner.delete_run(run_name, experiment_name=None)[source]#

Delete a run.

If an experiment name is not specified, we’ll look for the run in the default experiment.

Parameters
  • run_name (str) – Name of the run. View your runs with list_runs.

  • experiment_name (Optional[str]) – Optional name of the experiment.

Return type

None

finetuner.delete_runs(experiment_name=None)[source]#

Delete every run.

If an experiment name is not specified, we’ll delete every run across all experiments.

Parameters

experiment_name (Optional[str]) – Optional name of the experiment. View your experiment names with list_experiments().

Return type

None

finetuner.create_experiment(name=None)[source]#

Create an experiment.

Parameters

name (Optional[str]) – Optional name of the experiment. If None, the experiment is named after the current directory.

Return type

Experiment

Returns

An Experiment object.

finetuner.get_experiment(name)[source]#

Get an experiment by its name.

Parameters

name (str) – Name of the experiment.

Return type

Experiment

Returns

An Experiment object.

finetuner.list_experiments()[source]#

List every experiment.

Return type

List[Experiment]

finetuner.delete_experiment(name)[source]#

Delete an experiment by its name. :type name: str :param name: Name of the experiment.

View your experiment names with list_experiments().

Return type

Experiment

Returns

Deleted experiment.

finetuner.delete_experiments()[source]#

Delete every experiment. :rtype: List[Experiment] :return: List of deleted experiments.

finetuner.get_token()[source]#

Get user token of jina ecosystem.

Return type

str

Returns

user token as string object.

finetuner.get_model(artifact, token=None, batch_size=32, select_model=None, gpu=False, logging_level='WARNING', is_onnx=False)[source]#

Re-build the model based on the model inference session with ONNX.

Parameters
  • artifact (str) – Specify a finetuner run artifact. Can be a path to a local directory, a path to a local zip file, or a Hubble artifact ID. Individual model artifacts (model sub-folders inside the run artifacts) can also be specified using this argument.

  • token (Optional[str]) – A Jina authentication token required for pulling artifacts from Hubble. If not provided, the Hubble client will try to find one either in a local cache folder or in the environment.

  • batch_size (int) – Incoming documents are fed to the graph in batches, both to speed-up inference and avoid memory errors. This argument controls the number of documents that will be put in each batch.

  • select_model (Optional[str]) – Finetuner run artifacts might contain multiple models. In such cases you can select which model to deploy using this argument. For CLIP fine-tuning, you can choose either clip-vision or clip-text.

  • gpu (bool) – if specified to True, use cuda device for inference.

  • logging_level (str) – The executor logging level. See https://docs.python.org/3/library/logging.html#logging-levels for available options.

  • is_onnx (bool) – The model output format, either onnx or pt.

Returns

An instance of ONNXRuntimeInferenceEngine.

..Note::

please install finetuner[full] to include all the dependencies.

finetuner.encode(model, data, batch_size=32)[source]#

Preprocess, collate and encode the DocumentArray with embeddings.

Parameters
  • model – The model to be used to encode DocumentArray. In this case an instance of ONNXRuntimeInferenceEngine or TorchInferenceEngine produced by finetuner.get_model()

  • data (DocumentArray) – The DocumentArray object to be encoded.

  • batch_size (int) – Incoming documents are fed to the graph in batches, both to speed-up inference and avoid memory errors. This argument controls the number of documents that will be put in each batch.

Returns

DocumentArray filled with embeddings.

..Note::

please install “finetuner[full]” to include all the dependencies.