finetuner.tuner.pytorch.datasets module

class finetuner.tuner.pytorch.datasets.PytorchClassDataset(docs, preprocess_fn=None)[source]

Bases: finetuner.tuner.dataset.datasets.ClassDataset, torch.utils.data.Dataset

Create the dataset instance.

Parameters
  • docs (DocumentSequence) –

    The documents for the dataset. Each document is expected to have - a content (only blob or text are accepted currently) - a class label, saved under tags['finetuner__label']. This class

    label should be an integer or a string

  • preprocess_fn (Optional[ForwardRef]) – A pre-processing function, to apply pre-processing to documents on the fly. It should take as input the document in the dataset, and output whatever content the framework-specific dataloader (and model) would accept.

class finetuner.tuner.pytorch.datasets.PytorchSessionDataset(docs, preprocess_fn=None)[source]

Bases: finetuner.tuner.dataset.datasets.SessionDataset, torch.utils.data.Dataset

Create the dataset instance.

Parameters
  • docs (DocumentSequence) –

    The documents for the dataset. Each document is expected to have - a content (only blob or text are accepted currently) - matches, which should also have content, as well a label, stored under

    tags['finetuner__label'], which be either 1 or -1, denoting whether the match is a positive or negative input in relation to the anchor document

  • preprocess_fn (Optional[ForwardRef]) – A pre-processing function, to apply pre-processing to documents on the fly. It should take as input the document in the dataset, and output whatever content the framework-specific dataloader (and model) would accept.