finetuner package#
Subpackages#
Submodules#
- finetuner.console module
- finetuner.data module
- finetuner.excepts module
- finetuner.experiment module
- finetuner.finetuner module
Finetuner
Finetuner.login()
Finetuner.create_experiment()
Finetuner.get_experiment()
Finetuner.list_experiments()
Finetuner.delete_experiment()
Finetuner.delete_experiments()
Finetuner.create_training_run()
Finetuner.create_synthesis_run()
Finetuner.get_run()
Finetuner.list_runs()
Finetuner.delete_run()
Finetuner.delete_runs()
Finetuner.get_token()
- finetuner.model module
- finetuner.run module
Module contents#
- finetuner.login(force=False, interactive=None)[source]#
Login to Jina AI Cloud to use cloud-based fine-tuning. Thereby, an authentication token is generated which can be read with the
get_token()
function.- Parameters:
force (
bool
) – If set to true, an existing token will be overwritten. Otherwise, you will not login again, if a valid token already exists.interactive (
Optional
[bool
]) – Interactive mode should be set in Jupyter environments.
- finetuner.list_callbacks()[source]#
List available callbacks.
- Return type:
Dict
[str
, ~CallbackStubType]
- finetuner.list_model_options()[source]#
List available options per model.
- Return type:
Dict
[str
,List
[Dict
[str
,Any
]]]
- finetuner.describe_models(task=None)[source]#
Print model information, such as name, task, output dimension, architecture and description as a table.
- Parameters:
task (
Optional
[str
]) – The task for the backbone model, one of text-to-text, text-to-image, image-to-image. If not provided, will print all backbone models.- Return type:
None
- finetuner.fit(model, train_data, eval_data=None, val_split=0.0, model_artifact=None, run_name=None, description=None, experiment_name=None, model_options=None, loss='TripletMarginLoss', miner=None, miner_options=None, optimizer='Adam', optimizer_options=None, learning_rate=None, epochs=5, batch_size=None, callbacks=None, scheduler=None, scheduler_options=None, freeze=False, output_dim=None, device='cuda', num_workers=4, to_onnx=False, csv_options=None, public=False, num_items_per_class=4, sampler='auto', loss_optimizer=None, loss_optimizer_options=None)#
Create a Finetuner training
Run
, calling this function will submit a fine-tuning job to the Jina AI Cloud.- Parameters:
model (
str
) – The name of model to be fine-tuned. Run finetuner.list_models() or finetuner.describe_models() to see the available model names.train_data (
Union
[str
,TextIO
,DocumentArray
]) – Either a DocumentArray for training data, a name of the DocumentArray that is pushed on Jina AI Cloud or a path to a CSV file.eval_data (
Union
[str
,TextIO
,DocumentArray
,None
]) – Either a DocumentArray for evaluation data, a name of the DocumentArray that is pushed on Jina AI Cloud or a path to a CSV file.val_split (
float
) – Determines which portion of the train_data is held out for calculating a validation loss. If it is set to 0, or an eval_data parameter is provided, no data is held out from the training data. Instead, the eval_data is used to calculate the validation loss if it is provided.model_artifact (
Optional
[str
]) – To continue training the training of a model which was fine-tuned by a previous run, you can provide the artifact id of this model, which you can get viaRun.artifact_id()
.run_name (
Optional
[str
]) – Name of the run.description (
Optional
[str
]) – Run description.experiment_name (
Optional
[str
]) – Name of the experiment.model_options (
Optional
[Dict
[str
,Any
]]) – Additional arguments to pass to the model construction. These are model specific options and are different depending on the model you choose. Run finetuner.list_model_options() to see available options for every model.loss (
str
) – Name of the loss function used for fine-tuning. Default is TripletMarginLoss. Options: CosFaceLoss, NTXLoss, AngularLoss, ArcFaceLoss, BaseMetricLossFunction, MultipleLosses, CentroidTripletLoss, CircleLoss, ContrastiveLoss, CrossBatchMemory, FastAPLoss, GenericPairLoss, IntraPairVarianceLoss, LargeMarginSoftmaxLoss, GeneralizedLiftedStructureLoss, LiftedStructureLoss, MarginLoss, EmbeddingRegularizerMixin, WeightRegularizerMixin, MultiSimilarityLoss, NPairsLoss, NCALoss, NormalizedSoftmaxLoss, ProxyAnchorLoss, ProxyNCALoss, SignalToNoiseRatioContrastiveLoss, SoftTripleLoss, SphereFaceLoss, SupConLoss, TripletMarginLoss, TupletMarginLoss, VICRegLoss, CLIPLoss.miner (
Optional
[str
]) – Name of the miner to create tuple indices for the loss function. Options: AngularMiner, BaseMiner, BaseSubsetBatchMiner, BaseTupleMiner, BatchEasyHardMiner, BatchHardMiner, DistanceWeightedMiner, HDCMiner, EmbeddingsAlreadyPackagedAsTriplets, MaximumLossMiner, PairMarginMiner, MultiSimilarityMiner, TripletMarginMiner, UniformHistogramMiner.miner_options (
Optional
[Dict
[str
,Any
]]) – Additional parameters to pass to the miner construction. The set of applicable parameters is specific to the miner you choose. Details on the parameters can be found in the PyTorch Metric Learning documentationoptimizer (
str
) – Name of the optimizer used for fine-tuning. Options: Adadelta, Adagrad, Adam, AdamW, SparseAdam, Adamax, ASGD, LBFGS, NAdam, RAdam, RMSprop, Rprop, SGD.optimizer_options (
Optional
[Dict
[str
,Any
]]) – Additional parameters to pass to the optimizer construction. The set of applicable parameters is specific to the optimizer you choose. Details on the parameters can be found in the PyTorch documentationlearning_rate (
Optional
[float
]) – learning rate for the optimizer.epochs (
int
) – Number of epochs for fine-tuning.batch_size (
Optional
[int
]) – Number of items to include in a batch. If not set, the batch size will be configured automatically.callbacks (
Optional
[List
[~CallbackStubType]]) – List of callback stub objects. subpackage for available options, or run finetuner.list_callbacks().scheduler (
Optional
[str
]) – Name of a scheduler to use for learning rate scheduling. Supported types are: linear, cosine, cosine_with_restarts, polynomial, constant, constant_with_warmup.scheduler_options (
Optional
[Dict
[str
,Any
]]) – Dictionary of additional parameters to pass to the scheduler: num_warmup_steps, num_training_steps, and scheduler_step (either batch or epoch).freeze (
bool
) – If set to True, will freeze all layers except the last one.output_dim (
Optional
[int
]) – The expected output dimension as int. If set, will attach a projection head.device (
str
) – Whether to use the CPU, if set to cuda, a Nvidia GPU will be used. otherwise use cpu to run a cpu job.num_workers (
int
) – Number of CPU workers. If cpu: False this is the number of workers used by the dataloader.to_onnx (
bool
) – Set this parameter as True to convert the model to an onnx model. Please note that not all models support this. If this parameter is set, please pass is_onnx when making inference, e.g., when calling the get_model function.csv_options (
Optional
[CSVOptions
]) – ACSVOptions
object containing options used for reading in training and evaluation data from a CSV file, if they are provided as such.public (
bool
) – A boolean value indicates if the artifact is public. It should be set to True if you would like to share your fine-tuned model with others.num_items_per_class (
int
) – How many items per class (unique labels) to include in a batch. For example, ifbatch_size
is 20, andnum_items_per_class
is 4, the batch will consist of 4 items for each of the 5 classes. Batch size must be divisible by num_items_per_class.sampler (
str
) – Determines which sampling method will be used if the data is labeled. Default is auto, meaning that the sampler, will be the default for the loss function used. Setting to class will result in the ClassSampler being used, and setting to random will result in the RandomSampler being used. If set to random then num_items_per_class is not used.loss_optimizer (
Optional
[str
]) – Name of the optimizer used for fine-tuning to loss function, if it is a function that requires an optimizer. Options: Adadelta, Adagrad, Adam, AdamW, SparseAdam, Adamax, ASGD, LBFGS, NAdam, RAdam, RMSprop, Rprop, SGD. If left as None then optimizer specified by the optimizer argument will be used instead.loss_optimizer_options (
Optional
[Dict
[str
,Any
]]) –Additional parameters to pass to the optimizer of the loss function. The set of applicable parameters is specific to the optimizer you choose. Details on the parameters can be found in the PyTorch documentation.
Note
Unless necessary, please stick with device=”cuda”, cpu training could be extremely slow and inefficient.
- Return type:
- finetuner.synthesize(query_data, corpus_data, models, num_relations=10, run_name=None, description=None, experiment_name=None, device='cuda', num_workers=4, csv_options=None, public=False)#
Create a Finetuner synthesis
Run
, calling this function will submit a data synthesis job to the Jina AI Cloud.- Parameters:
query_data (
Union
[str
,List
[str
],DocumentArray
]) – Either aDocumentArray
for example queries, name of a DocumentArray that is pushed on Jina AI Cloud, the dataset itself as a list of strings or a path to a CSV file.corpus_data (
Union
[str
,List
[str
],DocumentArray
]) – Either aDocumentArray
for corpus data, a name of a DocumentArray that is pushed on Jina AI Cloud, the dataset itself as a list of strings or a path to a CSV file.models (
SynthesisModels
) – ASynthesisModels
object containing the names of the models used for relation mining and cross encoding. You can pass finetuner.data.DATA_SYNTHESIS_EN for the recommended models for synthesis based on english data.num_relations (
int
) – The number of relations to mine per query.run_name (
Optional
[str
]) – Name of the run.experiment_name (
Optional
[str
]) – Name of the experiment.device (
str
) – Whether to use the CPU, if set to cuda, a Nvidia GPU will be used. otherwise use cpu to run a cpu job.num_workers (
int
) – Number of CPU workers. If cpu: False this is the number of workers used by the dataloader.csv_options (
Optional
[CSVOptions
]) – ACSVOptions
object containing options used for reading in training and evaluation data from a CSV file, if they are provided as such.public (
bool
) – A boolean value indicates if the artifact is public. It should be set to True if you would like to share your synthesized data with others.
- Param:
description: Run Description.
Note
Unless necessary, please stick with device=”cuda”, cpu training could be extremely slow and inefficient.
- Return type:
- finetuner.create_training_run(model, train_data, eval_data=None, val_split=0.0, model_artifact=None, run_name=None, description=None, experiment_name=None, model_options=None, loss='TripletMarginLoss', miner=None, miner_options=None, optimizer='Adam', optimizer_options=None, learning_rate=None, epochs=5, batch_size=None, callbacks=None, scheduler=None, scheduler_options=None, freeze=False, output_dim=None, device='cuda', num_workers=4, to_onnx=False, csv_options=None, public=False, num_items_per_class=4, sampler='auto', loss_optimizer=None, loss_optimizer_options=None)#
Create a Finetuner training
Run
, calling this function will submit a fine-tuning job to the Jina AI Cloud.- Parameters:
model (
str
) – The name of model to be fine-tuned. Run finetuner.list_models() or finetuner.describe_models() to see the available model names.train_data (
Union
[str
,TextIO
,DocumentArray
]) – Either a DocumentArray for training data, a name of the DocumentArray that is pushed on Jina AI Cloud or a path to a CSV file.eval_data (
Union
[str
,TextIO
,DocumentArray
,None
]) – Either a DocumentArray for evaluation data, a name of the DocumentArray that is pushed on Jina AI Cloud or a path to a CSV file.val_split (
float
) – Determines which portion of the train_data is held out for calculating a validation loss. If it is set to 0, or an eval_data parameter is provided, no data is held out from the training data. Instead, the eval_data is used to calculate the validation loss if it is provided.model_artifact (
Optional
[str
]) – To continue training the training of a model which was fine-tuned by a previous run, you can provide the artifact id of this model, which you can get viaRun.artifact_id()
.run_name (
Optional
[str
]) – Name of the run.description (
Optional
[str
]) – Run description.experiment_name (
Optional
[str
]) – Name of the experiment.model_options (
Optional
[Dict
[str
,Any
]]) – Additional arguments to pass to the model construction. These are model specific options and are different depending on the model you choose. Run finetuner.list_model_options() to see available options for every model.loss (
str
) – Name of the loss function used for fine-tuning. Default is TripletMarginLoss. Options: CosFaceLoss, NTXLoss, AngularLoss, ArcFaceLoss, BaseMetricLossFunction, MultipleLosses, CentroidTripletLoss, CircleLoss, ContrastiveLoss, CrossBatchMemory, FastAPLoss, GenericPairLoss, IntraPairVarianceLoss, LargeMarginSoftmaxLoss, GeneralizedLiftedStructureLoss, LiftedStructureLoss, MarginLoss, EmbeddingRegularizerMixin, WeightRegularizerMixin, MultiSimilarityLoss, NPairsLoss, NCALoss, NormalizedSoftmaxLoss, ProxyAnchorLoss, ProxyNCALoss, SignalToNoiseRatioContrastiveLoss, SoftTripleLoss, SphereFaceLoss, SupConLoss, TripletMarginLoss, TupletMarginLoss, VICRegLoss, CLIPLoss.miner (
Optional
[str
]) – Name of the miner to create tuple indices for the loss function. Options: AngularMiner, BaseMiner, BaseSubsetBatchMiner, BaseTupleMiner, BatchEasyHardMiner, BatchHardMiner, DistanceWeightedMiner, HDCMiner, EmbeddingsAlreadyPackagedAsTriplets, MaximumLossMiner, PairMarginMiner, MultiSimilarityMiner, TripletMarginMiner, UniformHistogramMiner.miner_options (
Optional
[Dict
[str
,Any
]]) –Additional parameters to pass to the miner construction. The set of applicable parameters is specific to the miner you choose. Details on the parameters can be found in the PyTorch Metric Learning documentation
optimizer (
str
) – Name of the optimizer used for fine-tuning. Options: Adadelta, Adagrad, Adam, AdamW, SparseAdam, Adamax, ASGD, LBFGS, NAdam, RAdam, RMSprop, Rprop, SGD.optimizer_options (
Optional
[Dict
[str
,Any
]]) –Additional parameters to pass to the optimizer construction. The set of applicable parameters is specific to the optimizer you choose. Details on the parameters can be found in the PyTorch documentation
learning_rate (
Optional
[float
]) – learning rate for the optimizer.epochs (
int
) – Number of epochs for fine-tuning.batch_size (
Optional
[int
]) – Number of items to include in a batch. If not set, the batch size will be configured automatically.callbacks (
Optional
[List
[~CallbackStubType]]) – List of callback stub objects. subpackage for available options, or run finetuner.list_callbacks().scheduler (
Optional
[str
]) – Name of a scheduler to use for learning rate scheduling. Supported types are: linear, cosine, cosine_with_restarts, polynomial, constant, constant_with_warmup.scheduler_options (
Optional
[Dict
[str
,Any
]]) – Dictionary of additional parameters to pass to the scheduler: num_warmup_steps, num_training_steps, and scheduler_step (either batch or epoch).freeze (
bool
) – If set to True, will freeze all layers except the last one.output_dim (
Optional
[int
]) – The expected output dimension as int. If set, will attach a projection head.device (
str
) – Whether to use the CPU, if set to cuda, a Nvidia GPU will be used. otherwise use cpu to run a cpu job.num_workers (
int
) – Number of CPU workers. If cpu: False this is the number of workers used by the dataloader.to_onnx (
bool
) – Set this parameter as True to convert the model to an onnx model. Please note that not all models support this. If this parameter is set, please pass is_onnx when making inference, e.g., when calling the get_model function.csv_options (
Optional
[CSVOptions
]) – ACSVOptions
object containing options used for reading in training and evaluation data from a CSV file, if they are provided as such.public (
bool
) – A boolean value indicates if the artifact is public. It should be set to True if you would like to share your fine-tuned model with others.num_items_per_class (
int
) – How many items per class (unique labels) to include in a batch. For example, ifbatch_size
is 20, andnum_items_per_class
is 4, the batch will consist of 4 items for each of the 5 classes. Batch size must be divisible by num_items_per_class.sampler (
str
) – Determines which sampling method will be used if the data is labeled. Default is auto, meaning that the sampler, will be the default for the loss function used. Setting to class will result in the ClassSampler being used, and setting to random will result in the RandomSampler being used. If set to random then num_items_per_class is not used.loss_optimizer (
Optional
[str
]) – Name of the optimizer used for fine-tuning to loss function, if it is a function that requires an optimizer. Options: Adadelta, Adagrad, Adam, AdamW, SparseAdam, Adamax, ASGD, LBFGS, NAdam, RAdam, RMSprop, Rprop, SGD. If left as None then optimizer specified by the optimizer argument will be used instead.loss_optimizer_options (
Optional
[Dict
[str
,Any
]]) –Additional parameters to pass to the optimizer of the loss function. The set of applicable parameters is specific to the optimizer you choose. Details on the parameters can be found in the PyTorch documentation.
Note
Unless necessary, please stick with device=”cuda”, cpu training could be extremely slow and inefficient.
- Return type:
- finetuner.create_run(model, train_data, eval_data=None, val_split=0.0, model_artifact=None, run_name=None, description=None, experiment_name=None, model_options=None, loss='TripletMarginLoss', miner=None, miner_options=None, optimizer='Adam', optimizer_options=None, learning_rate=None, epochs=5, batch_size=None, callbacks=None, scheduler=None, scheduler_options=None, freeze=False, output_dim=None, device='cuda', num_workers=4, to_onnx=False, csv_options=None, public=False, num_items_per_class=4, sampler='auto', loss_optimizer=None, loss_optimizer_options=None)#
Create a Finetuner training
Run
, calling this function will submit a fine-tuning job to the Jina AI Cloud.- Parameters:
model (
str
) – The name of model to be fine-tuned. Run finetuner.list_models() or finetuner.describe_models() to see the available model names.train_data (
Union
[str
,TextIO
,DocumentArray
]) – Either a DocumentArray for training data, a name of the DocumentArray that is pushed on Jina AI Cloud or a path to a CSV file.eval_data (
Union
[str
,TextIO
,DocumentArray
,None
]) – Either a DocumentArray for evaluation data, a name of the DocumentArray that is pushed on Jina AI Cloud or a path to a CSV file.val_split (
float
) – Determines which portion of the train_data is held out for calculating a validation loss. If it is set to 0, or an eval_data parameter is provided, no data is held out from the training data. Instead, the eval_data is used to calculate the validation loss if it is provided.model_artifact (
Optional
[str
]) – To continue training the training of a model which was fine-tuned by a previous run, you can provide the artifact id of this model, which you can get viaRun.artifact_id()
.run_name (
Optional
[str
]) – Name of the run.description (
Optional
[str
]) – Run description.experiment_name (
Optional
[str
]) – Name of the experiment.model_options (
Optional
[Dict
[str
,Any
]]) – Additional arguments to pass to the model construction. These are model specific options and are different depending on the model you choose. Run finetuner.list_model_options() to see available options for every model.loss (
str
) – Name of the loss function used for fine-tuning. Default is TripletMarginLoss. Options: CosFaceLoss, NTXLoss, AngularLoss, ArcFaceLoss, BaseMetricLossFunction, MultipleLosses, CentroidTripletLoss, CircleLoss, ContrastiveLoss, CrossBatchMemory, FastAPLoss, GenericPairLoss, IntraPairVarianceLoss, LargeMarginSoftmaxLoss, GeneralizedLiftedStructureLoss, LiftedStructureLoss, MarginLoss, EmbeddingRegularizerMixin, WeightRegularizerMixin, MultiSimilarityLoss, NPairsLoss, NCALoss, NormalizedSoftmaxLoss, ProxyAnchorLoss, ProxyNCALoss, SignalToNoiseRatioContrastiveLoss, SoftTripleLoss, SphereFaceLoss, SupConLoss, TripletMarginLoss, TupletMarginLoss, VICRegLoss, CLIPLoss.miner (
Optional
[str
]) – Name of the miner to create tuple indices for the loss function. Options: AngularMiner, BaseMiner, BaseSubsetBatchMiner, BaseTupleMiner, BatchEasyHardMiner, BatchHardMiner, DistanceWeightedMiner, HDCMiner, EmbeddingsAlreadyPackagedAsTriplets, MaximumLossMiner, PairMarginMiner, MultiSimilarityMiner, TripletMarginMiner, UniformHistogramMiner.miner_options (
Optional
[Dict
[str
,Any
]]) –Additional parameters to pass to the miner construction. The set of applicable parameters is specific to the miner you choose. Details on the parameters can be found in the PyTorch Metric Learning documentation
optimizer (
str
) – Name of the optimizer used for fine-tuning. Options: Adadelta, Adagrad, Adam, AdamW, SparseAdam, Adamax, ASGD, LBFGS, NAdam, RAdam, RMSprop, Rprop, SGD.optimizer_options (
Optional
[Dict
[str
,Any
]]) –Additional parameters to pass to the optimizer construction. The set of applicable parameters is specific to the optimizer you choose. Details on the parameters can be found in the PyTorch documentation
learning_rate (
Optional
[float
]) – learning rate for the optimizer.epochs (
int
) – Number of epochs for fine-tuning.batch_size (
Optional
[int
]) – Number of items to include in a batch. If not set, the batch size will be configured automatically.callbacks (
Optional
[List
[~CallbackStubType]]) – List of callback stub objects. subpackage for available options, or run finetuner.list_callbacks().scheduler (
Optional
[str
]) – Name of a scheduler to use for learning rate scheduling. Supported types are: linear, cosine, cosine_with_restarts, polynomial, constant, constant_with_warmup.scheduler_options (
Optional
[Dict
[str
,Any
]]) – Dictionary of additional parameters to pass to the scheduler: num_warmup_steps, num_training_steps, and scheduler_step (either batch or epoch).freeze (
bool
) – If set to True, will freeze all layers except the last one.output_dim (
Optional
[int
]) – The expected output dimension as int. If set, will attach a projection head.device (
str
) – Whether to use the CPU, if set to cuda, a Nvidia GPU will be used. otherwise use cpu to run a cpu job.num_workers (
int
) – Number of CPU workers. If cpu: False this is the number of workers used by the dataloader.to_onnx (
bool
) – Set this parameter as True to convert the model to an onnx model. Please note that not all models support this. If this parameter is set, please pass is_onnx when making inference, e.g., when calling the get_model function.csv_options (
Optional
[CSVOptions
]) – ACSVOptions
object containing options used for reading in training and evaluation data from a CSV file, if they are provided as such.public (
bool
) – A boolean value indicates if the artifact is public. It should be set to True if you would like to share your fine-tuned model with others.num_items_per_class (
int
) – How many items per class (unique labels) to include in a batch. For example, ifbatch_size
is 20, andnum_items_per_class
is 4, the batch will consist of 4 items for each of the 5 classes. Batch size must be divisible by num_items_per_class.sampler (
str
) – Determines which sampling method will be used if the data is labeled. Default is auto, meaning that the sampler, will be the default for the loss function used. Setting to class will result in the ClassSampler being used, and setting to random will result in the RandomSampler being used. If set to random then num_items_per_class is not used.loss_optimizer (
Optional
[str
]) – Name of the optimizer used for fine-tuning to loss function, if it is a function that requires an optimizer. Options: Adadelta, Adagrad, Adam, AdamW, SparseAdam, Adamax, ASGD, LBFGS, NAdam, RAdam, RMSprop, Rprop, SGD. If left as None then optimizer specified by the optimizer argument will be used instead.loss_optimizer_options (
Optional
[Dict
[str
,Any
]]) –Additional parameters to pass to the optimizer of the loss function. The set of applicable parameters is specific to the optimizer you choose. Details on the parameters can be found in the PyTorch documentation.
Note
Unless necessary, please stick with device=”cuda”, cpu training could be extremely slow and inefficient.
- Return type:
- finetuner.create_synthesis_run(query_data, corpus_data, models, num_relations=10, run_name=None, description=None, experiment_name=None, device='cuda', num_workers=4, csv_options=None, public=False)#
Create a Finetuner synthesis
Run
, calling this function will submit a data synthesis job to the Jina AI Cloud.- Parameters:
query_data (
Union
[str
,List
[str
],DocumentArray
]) – Either aDocumentArray
for example queries, name of a DocumentArray that is pushed on Jina AI Cloud, the dataset itself as a list of strings or a path to a CSV file.corpus_data (
Union
[str
,List
[str
],DocumentArray
]) – Either aDocumentArray
for corpus data, a name of a DocumentArray that is pushed on Jina AI Cloud, the dataset itself as a list of strings or a path to a CSV file.models (
SynthesisModels
) – ASynthesisModels
object containing the names of the models used for relation mining and cross encoding. You can pass finetuner.data.DATA_SYNTHESIS_EN for the recommended models for synthesis based on english data.num_relations (
int
) – The number of relations to mine per query.run_name (
Optional
[str
]) – Name of the run.experiment_name (
Optional
[str
]) – Name of the experiment.device (
str
) – Whether to use the CPU, if set to cuda, a Nvidia GPU will be used. otherwise use cpu to run a cpu job.num_workers (
int
) – Number of CPU workers. If cpu: False this is the number of workers used by the dataloader.csv_options (
Optional
[CSVOptions
]) – ACSVOptions
object containing options used for reading in training and evaluation data from a CSV file, if they are provided as such.public (
bool
) – A boolean value indicates if the artifact is public. It should be set to True if you would like to share your synthesized data with others.
- Param:
description: Run Description.
Note
Unless necessary, please stick with device=”cuda”, cpu training could be extremely slow and inefficient.
- Return type:
- finetuner.get_run(run_name, experiment_name=None)[source]#
Get a
Run
by its name and (optional)Experiment
name.If an experiment name is not specified, we’ll look for the run in the default experiment.
- Parameters:
run_name (
str
) – Name of theRun
.experiment_name (
Optional
[str
]) – Optional name of theExperiment
.
- Return type:
- Returns:
A
Run
object.
- finetuner.list_runs(experiment_name=None, page=1, size=50)[source]#
List all created
Run
inside a givenExperiment
.- If no
Experiment
is specified, listRun
for all available Experiment
.
- Parameters:
experiment_name (
Optional
[str
]) – The name of theExperiment
.page (
int
) – The page index.size (
int
) – Number ofRun
to retrieve.
- Return type:
List
[Run
]- Returns:
List of all
Run
.
- ..note:: page and size works together. For example, page 1 size 50 gives
the 50 runs in the first page. To get 50-100, set page as 2.
..note:: The maximum number for size per page is 100.
- If no
- finetuner.delete_run(run_name, experiment_name=None)[source]#
- Delete a
Run
given a run_name and optional experiment_name.
If an experiment name is not specified, we’ll look for the run in the default experiment.
- Parameters:
run_name (
str
) – Name of the run. View your runs with list_runs.experiment_name (
Optional
[str
]) – Optional name of the experiment.
- Return type:
None
- Delete a
- finetuner.delete_runs(experiment_name=None)[source]#
Delete all
Run
given an optional experiment_name.If an experiment name is not specified, we’ll delete every run across all experiments.
- Parameters:
experiment_name (
Optional
[str
]) – Optional name of the experiment. View your experiment names with list_experiments().- Return type:
None
- finetuner.create_experiment(name='default')[source]#
Create an
Experiment
.- Parameters:
name (
str
) – The name of the experiment. If not provided, the experiment is named as default.- Return type:
- Returns:
An Experiment object.
- finetuner.get_experiment(name)[source]#
Get an
Experiment
given a name.- Parameters:
name (
str
) – Name of the experiment.- Return type:
- Returns:
An Experiment object.
- finetuner.list_experiments(page=1, size=50)[source]#
List all
Experiment
.- Parameters:
page (
int
) – The page index.size (
int
) – The number of experiments to retrieve.
- Return type:
List
[Experiment
]- Returns:
A list of
Experiment
instance.
- ..note:: page and size works together. For example, page 1 size 50 gives
the 50 experiments in the first page. To get 50-100, set page as 2.
..note:: The maximum number for size per page is 100.
- finetuner.delete_experiment(name)[source]#
Delete an
Experiment
given a name.- Parameters:
name (
str
) – Name of the experiment. View your experiment names with list_experiments().- Return type:
- Returns:
Deleted experiment.
- finetuner.delete_experiments()[source]#
Delete all
Experiment
. :rtype:List
[Experiment
] :return: List of deleted experiments.
- finetuner.get_token()[source]#
Get user token from the Jina AI Cloud,
login()
is required.- Return type:
str
- Returns:
user token as string object.
- finetuner.build_model(name, model_options=None, batch_size=32, select_model=None, device=None, is_onnx=False)[source]#
Builds a pre-trained model given a name.
- Parameters:
name (
str
) – Refers to a pre-trained model, see https://finetuner.jina.ai/walkthrough/choose-backbone/ or use thefinetuner.describe_models()
function for a list of all supported models.model_options (
Optional
[Dict
[str
,Any
]]) – A dictionary of model specific options.batch_size (
int
) – Incoming documents are fed to the graph in batches, both to speed-up inference and avoid memory errors. This argument controls the number of documents that will be put in each batch.select_model (
Optional
[str
]) – Finetuner run artifacts might contain multiple models. In such cases you can select which model to deploy using this argument. For CLIP fine-tuning, you can choose either clip-vision or clip-text.device (
Optional
[str
]) – Whether to use the CPU, if set to cuda, a Nvidia GPU will be used. otherwise use cpu to run a cpu job.is_onnx (
bool
) – The model output format, either onnx or pt.
- Return type:
InferenceEngine
- Returns:
an instance of :class:’TorchInferenceEngine’ or
ONNXINferenceEngine
.
- finetuner.get_model(artifact, token=None, batch_size=32, select_model=None, device=None, logging_level='WARNING', is_onnx=False)[source]#
Re-build the model based on the model inference session with ONNX.
- Parameters:
artifact (
str
) – Specify a finetuner run artifact. Can be a path to a local directory, a path to a local zip file, a Hubble artifact ID, or a huggingface model created with finetuner. Individual model artifacts (model sub-folders inside the run artifacts) can also be specified using this argument.token (
Optional
[str
]) – A Jina authentication token (required for pulling artifacts from Hubble) or a HuggingFace authentication token (required only when downloading private models from huggingface). If not provided, the Hubble client might try to find one either in a local cache folder or in the environment.batch_size (
int
) – Incoming documents are fed to the graph in batches, both to speed-up inference and avoid memory errors. This argument controls the number of documents that will be put in each batch.select_model (
Optional
[str
]) – Finetuner run artifacts might contain multiple models. In such cases you can select which model to deploy using this argument. For CLIP fine-tuning, you can choose either clip-vision or clip-text.device (
Optional
[str
]) – Whether to use the CPU, if set to cuda, a Nvidia GPU will be used. otherwise use cpu to run a cpu job.logging_level (
str
) – The executor logging level. See https://docs.python.org/3/library/logging.html#logging-levels for available options.is_onnx (
bool
) – The model output format, either onnx or pt.
- Return type:
InferenceEngine
- Returns:
An instance of
ONNXRuntimeInferenceEngine
.
- ..Note::
please install finetuner[full] to include all the dependencies.
- finetuner.encode(model, data, batch_size=32)[source]#
- Preprocess, collate and encode the list or :class:`DocumentArray
with embeddings.
- Parameters:
model (InferenceEngine) – The model to be used to encode DocumentArray. In this case an instance of ONNXRuntimeInferenceEngine or TorchInferenceEngine produced by
finetuner.get_model()
data (
Union
[DocumentArray
,List
[str
]]) – The DocumentArray object to be encoded.batch_size (
int
) – Incoming documents are fed to the graph in batches, both to speed-up inference and avoid memory errors. This argument controls the number of documents that will be put in each batch.
- Return type:
Union
[DocumentArray
,ForwardRef
]- Returns:
DocumentArray filled with embeddings.
- ..Note::
please install “finetuner[full]” to include all the dependencies.