finetuner.embedding module#

finetuner.embedding.embed(docs, embed_model, device='cpu', batch_size=64, preprocess_fn=None, collate_fn=None)[source]#

Fill the embedding of Documents inplace by using embed_model :type docs: DocumentArray :param docs: the Documents to be embedded :type embed_model: AnyDNN :param embed_model: the embedding model written in Keras/Pytorch/Paddle :type device: str :param device: the computational device for embed_model, can be either

cpu or cuda.

Parameters
  • batch_size (int) – number of Documents in a batch for embedding

  • preprocess_fn (Optional[ForwardRef]) – A pre-processing function, to apply pre-processing to documents on the fly. It should take as input the document in the dataset, and output whatever content the model would accept.

  • collate_fn (Optional[ForwardRef]) – The collation function to merge the content of individual items into a batch. Should accept a list with the content of each item, and output a tensor (or a list/dict of tensors) that feed directly into the embedding model

Return type

None