finetuner.tuner.paddle.datasets module

class finetuner.tuner.paddle.datasets.PaddleClassDataset(docs, preprocess_fn=None)[source]

Bases: finetuner.tuner.dataset.datasets.ClassDataset, paddle.io.Dataset

Create the dataset instance.

Parameters
  • docs (DocumentSequence) –

    The documents for the dataset. Each document is expected to have - a content (only blob or text are accepted currently) - a class label, saved under tags['finetuner__label']. This class

    label should be an integer or a string

  • preprocess_fn (Optional[ForwardRef]) – A pre-processing function, to apply pre-processing to documents on the fly. It should take as input the document in the dataset, and output whatever content the framework-specific dataloader (and model) would accept.

class finetuner.tuner.paddle.datasets.PaddleSessionDataset(docs, preprocess_fn=None)[source]

Bases: finetuner.tuner.dataset.datasets.SessionDataset, paddle.io.Dataset

Create the dataset instance.

Parameters
  • docs (DocumentSequence) –

    The documents for the dataset. Each document is expected to have - a content (only blob or text are accepted currently) - matches, which should also have content, as well a label, stored under

    tags['finetuner__label'], which be either 1 or -1, denoting whether the match is a positive or negative input in relation to the anchor document

  • preprocess_fn (Optional[ForwardRef]) – A pre-processing function, to apply pre-processing to documents on the fly. It should take as input the document in the dataset, and output whatever content the framework-specific dataloader (and model) would accept.