finetuner.tuner package#
Subpackages#
- finetuner.tuner.callback package
- Submodules
- finetuner.tuner.callback.base module
- finetuner.tuner.callback.best_model_checkpoint module
- finetuner.tuner.callback.early_stopping module
- finetuner.tuner.callback.evaluation module
- finetuner.tuner.callback.progress_bar module
- finetuner.tuner.callback.training_checkpoint module
- finetuner.tuner.callback.wandb_logger module
- Module contents
- Submodules
- finetuner.tuner.dataset package
- finetuner.tuner.keras package
- finetuner.tuner.miner package
- finetuner.tuner.paddle package
- finetuner.tuner.pytorch package
Submodules#
Module contents#
- finetuner.tuner.fit(embed_model, train_data, eval_data=None, preprocess_fn=None, collate_fn=None, epochs=10, batch_size=256, num_items_per_class=None, loss='SiameseLoss', configure_optimizer=None, learning_rate=0.001, scheduler_step='batch', device='cpu', callbacks=None, num_workers=0, **kwargs)[source]#
Finetune the model on the training data.
- Parameters
embed_model (AnyDNN) – an embedding model.
train_data (DocumentArray) – Data on which to train the model.
eval_data (
Optional
[ForwardRef
]) – Data on which the validation loss is computed.preprocess_fn (
Optional
[ForwardRef
]) – A pre-processing function, to apply pre-processing to documents on the fly. It should take as input the document in the dataset, and output whatever content the framework-specific dataloader (and model) would accept.collate_fn (
Optional
[ForwardRef
]) – The collation function to merge the content of individual items into a batch. Should accept a list with the content of each item, and output a tensor (or a list/dict of tensors) that feed directly into the embedding model.epochs (
int
) – Number of epochs to train the model.batch_size (
int
) – The batch size to use for training and evaluation.num_items_per_class (
Optional
[int
]) – Number of items from a single class to include in the batch. Only relevant forClassDataset
.loss (
Union
[str
,ForwardRef
]) – Which loss to use in training. Supported losses are: -SiameseLoss
for Siamese network -TripletLoss
for Triplet networkconfigure_optimizer (
Optional
[Callable
[[ForwardRef
],Union
[ForwardRef
,Tuple
[ForwardRef
,ForwardRef
]]]]) – A function that allows you to provide a custom optimizer and learning rate. The function should take one input - the embedding model, and return either just an optimizer or a tuple of an optimizer and a learning rate scheduler.learning_rate (
float
) – Learning rate for the default optimizer. If you provide a custom optimizer, this learning rate will not apply.scheduler_step (
str
) – At which interval should the learning rate sheduler’s step function be called. Valid options are “batch” and “epoch”.device (
str
) – The device to which to move the model. Supported options are"cpu"
and"cuda"
(for GPU).callbacks (
Optional
[List
[ForwardRef
]]) – A list of callbacks. The progress bar callback will be pre-prended to this list.num_workers (
int
) – Number of workers used for loading the data. This works only with Pytorch and Paddle Paddle, and has no effect when using a Keras model.
- finetuner.tuner.save(embed_model, model_path, *args, **kwargs)[source]#
Save the embedding model.
- Parameters
embed_model (AnyDNN) – The embedding model to save.
model_path (
str
) – Path to file/folder where to save the model.args – Arguments to pass to framework-specific tuner’s
save
method.kwargs – Keyword arguments to pass to framework-specific tuner’s
save
method.
- Return type
None