finetuner.tuner.paddle.datasets module#
- class finetuner.tuner.paddle.datasets.PaddleClassDataset(docs, preprocess_fn=None)[source]#
Bases:
finetuner.tuner.dataset.datasets.ClassDataset
,paddle.io.Dataset
Create the dataset instance.
- Parameters
docs (DocumentArray) –
The documents for the dataset. Each document is expected to have - a content (only tensor or text are accepted currently) - a class label, saved under
tags['finetuner__label']
. This classlabel should be an integer or a string
preprocess_fn (
Optional
[ForwardRef
]) – A pre-processing function, to apply pre-processing to documents on the fly. It should take as input the document in the dataset, and output whatever content the framework-specific dataloader (and model) would accept.
- class finetuner.tuner.paddle.datasets.PaddleSessionDataset(docs, preprocess_fn=None)[source]#
Bases:
finetuner.tuner.dataset.datasets.SessionDataset
,paddle.io.Dataset
Create the dataset instance.
- Parameters
docs (DocumentArray) –
The documents for the dataset. Each document is expected to have - a content (only tensor or text are accepted currently) - matches, which should also have content, as well a label, stored under
tags['finetuner__label']
, which be either 1 or -1, denoting whether the match is a positive or negative input in relation to the anchor documentpreprocess_fn (
Optional
[ForwardRef
]) – A pre-processing function, to apply pre-processing to documents on the fly. It should take as input the document in the dataset, and output whatever content the framework-specific dataloader (and model) would accept.
- class finetuner.tuner.paddle.datasets.PaddleInstanceDataset(docs, preprocess_fn=None)[source]#
Bases:
finetuner.tuner.dataset.datasets.InstanceDataset
,paddle.io.Dataset
Create the dataset instance.
- Parameters
docs (DocumentArray) – The documents for the dataset. Each document is expected to have content, but no label is needed
preprocess_fn (
Optional
[ForwardRef
]) – A pre-processing function, to apply pre-processing to documents on the fly. It should take as input the document in the dataset, and output whatever content the framework-specific dataloader (and model) would accept.