finetuner.embedding module#
- finetuner.embedding.embed(docs, embed_model, device='cpu', batch_size=64, preprocess_fn=None, collate_fn=None)[source]#
Fill the embedding of Documents inplace by using embed_model :type docs: DocumentArray :param docs: the Documents to be embedded :type embed_model: AnyDNN :param embed_model: the embedding model written in Keras/Pytorch/Paddle :type device:
str
:param device: the computational device for embed_model, can be eithercpu or cuda.
- Parameters
batch_size (
int
) – number of Documents in a batch for embeddingpreprocess_fn (
Optional
[ForwardRef
]) – A pre-processing function, to apply pre-processing to documents on the fly. It should take as input the document in the dataset, and output whatever content the model would accept.collate_fn (
Optional
[ForwardRef
]) – The collation function to merge the content of individual items into a batch. Should accept a list with the content of each item, and output a tensor (or a list/dict of tensors) that feed directly into the embedding model
- Return type
None