finetuner.data module#
- class finetuner.data.CSVOptions(size=None, sampling_rate=None, dialect='auto', encoding='utf-8', is_labeled=False, convert_to_blob=True, create_point_clouds=True, point_cloud_size=2048)[source]#
Bases:
object
Class containing options for reading CSV files
- Parameters:
size (
Optional
[int
]) – The number of rows that will be sampled.sampling_rate (
Optional
[float
]) – The sampling rate between [0, 1] indicating how many lines of the CSV are skipped. a sampling rate of 1 means that none are skipped, 0.5 means that half are skipped, and 0 means that all lines are skipped.dialect (
Union
[str
,Dialect
]) – A description of the expected format of the CSV, can either be an object of thecsv.Dialect
class, or one of the strings returned by the :meth:`csv.list_dialects()’ function.encoding (
str
) – The encoding of the CSV file.is_labeled (
bool
) – Whether the second column of the CSV represents a label that should be assigned to the item in the first column (True), or if it is another item that should be semantically close to the first (False).convert_to_blob (
bool
) – Whether uris to local files should be converted to blobscreate_point_clouds (
bool
) – Determines whether from uris to local 3D mesh files should point clouds be sampled.point_cloud_size (
int
) – Determines the number of points sampled from a mesh to create a point cloud.
- size: int | None = None#
- sampling_rate: float | None = None#
- dialect: str | Dialect = 'auto'#
- encoding: str = 'utf-8'#
- is_labeled: bool = False#
- convert_to_blob: bool = True#
- create_point_clouds: bool = True#
- point_cloud_size: int = 2048#
- finetuner.data.build_finetuning_dataset(data, model, csv_options=None)[source]#
If data has been provided as a CSV file, the given CSV file is parsed and a
DocumentArray
is created.- Return type:
Union
[str
,DocumentArray
]
- finetuner.data.build_encoding_dataset(model, data)[source]#
If data has been provided as a list, a
DocumentArray
is created from the elements of the list- Return type:
DocumentArray
- finetuner.data.load_finetune_data_from_csv(file, task='text-to-text', options=None)[source]#
Takes a CSV file and returns a generator of documents, with each document containing the information from one line of the CSV.
- Parameters:
file (
Union
[str
,TextIO
]) – Either a filepath to or a stream of a CSV file.task (
str
) – Specifies the modalities of the model that the returned data is to be used for. This data is retrieved using the model name, and does not need to be added to the csv_options argument when callingfinetuner.fit()
options (
Optional
[CSVOptions
]) – ACSVOptions
object.
- Return type:
Generator
[Document
,None
,None
]- Returns:
A generator of :class:`Document`s. Each document represents one element in the CSV
- finetuner.data.check_columns(task, col1, col2)[source]#
- Determines the expected modalities of each column using the task argument,
Then checks the given row of the CSV to confirm that it contains valid data
- Parameters:
task (
str
) – The task of the model being used.col1 (
str
) – A single value from the first column of the CSV.col2 (
str
) – A single value from the second column of the CSV.
- Return type:
Tuple
[str
,str
]- Returns:
The expected modality of each column
- finetuner.data.create_document(modality, column, convert_to_blob, create_point_clouds, point_cloud_size=2048)[source]#
- Checks the expected modality of the value in the given column
and creates a
Document
with that value
- Parameters:
modality (
str
) – The expected modality of the value in the given columncolumn (
str
) – A single value of a columnconvert_to_blob (
bool
) – Whether uris to local image files should be converted to blobs.create_point_clouds (
bool
) – Whether from uris to local 3D mesh files should point clouds be sampled.point_cloud_size (
int
) – Determines the number of points sampled from a mesh to create a point cloud.
- Return type:
Document