finetuner.embedding module

finetuner.embedding.embed(docs, embed_model, device='cpu', batch_size=256, preprocess_fn=None, collate_fn=None)[source]

Fill the embedding of Documents inplace by using embed_model

  • docs (Union[ForwardRef, ForwardRef]) – the Documents to be embedded

  • embed_model (AnyDNN) – the embedding model written in Keras/Pytorch/Paddle

  • device (str) – the computational device for embed_model, can be either cpu or cuda.

  • batch_size (int) – number of Documents in a batch for embedding

  • preprocess_fn (Optional[ForwardRef]) – A pre-processing function, to apply pre-processing to documents on the fly. It should take as input the document in the dataset, and output whatever content the model would accept.

  • collate_fn (Optional[ForwardRef]) – The collation function to merge the content of individual items into a batch. Should accept a list with the content of each item, and output a tensor (or a list/dict of tensors) that feed directly into the embedding model

Return type