finetuner.tuner.dataset.datasets module#
- class finetuner.tuner.dataset.datasets.InstanceDataset(docs, preprocess_fn=None)[source]#
Bases:
finetuner.tuner.dataset.base.BaseDataset
[int
]Dataset for unlabeled data (for self-supervised learning).
In this dataset, each instance (each item) is given its own label.
Create the dataset instance.
- Parameters
docs (DocumentArray) – The documents for the dataset. Each document is expected to have content, but no label is needed
preprocess_fn (
Optional
[ForwardRef
]) – A pre-processing function, to apply pre-processing to documents on the fly. It should take as input the document in the dataset, and output whatever content the framework-specific dataloader (and model) would accept.
- class finetuner.tuner.dataset.datasets.ClassDataset(docs, preprocess_fn=None)[source]#
Bases:
finetuner.tuner.dataset.base.BaseDataset
[int
]Dataset for enapsulating data where each item has a class label.
Create the dataset instance.
- Parameters
docs (DocumentArray) –
The documents for the dataset. Each document is expected to have - a content (only tensor or text are accepted currently) - a class label, saved under
tags['finetuner__label']
. This classlabel should be an integer or a string
preprocess_fn (
Optional
[ForwardRef
]) – A pre-processing function, to apply pre-processing to documents on the fly. It should take as input the document in the dataset, and output whatever content the framework-specific dataloader (and model) would accept.
- property labels: List[int]#
Get the list of integer labels for all items in the dataset.
- Return type
List
[int
]
- class finetuner.tuner.dataset.datasets.SessionDataset(docs, preprocess_fn=None)[source]#
Bases:
finetuner.tuner.dataset.base.BaseDataset
[Tuple
[int
,int
]]Dataset for enapsulating data that comes in batches of “sessions”.
A session here is supposed to mean an anchor document, together with a set of matches, which may be either positive or negative inputs.
Create the dataset instance.
- Parameters
docs (DocumentArray) –
The documents for the dataset. Each document is expected to have - a content (only tensor or text are accepted currently) - matches, which should also have content, as well a label, stored under
tags['finetuner__label']
, which be either 1 or -1, denoting whether the match is a positive or negative input in relation to the anchor documentpreprocess_fn (
Optional
[ForwardRef
]) – A pre-processing function, to apply pre-processing to documents on the fly. It should take as input the document in the dataset, and output whatever content the framework-specific dataloader (and model) would accept.
- property labels: List[Tuple[int, int]]#
Get the list of labels for all items in the dataset.
A label consists of two integers, the session ID (index of root document in original document array), and the type, which is 0 if the document is the anchor (root) document, 1 if it is a positive input (match), and -1 if it is a negative input (match)
- Return type
List
[Tuple
[int
,int
]]