finetuner.data module#

class finetuner.data.CSVOptions(size=None, sampling_rate=None, dialect='auto', encoding='utf-8', is_labeled=False, convert_to_blob=True, create_point_clouds=True, point_cloud_size=2048)[source]#

Bases: object

Class containing options for reading CSV files

Parameters:
  • size (Optional[int]) – The number of rows that will be sampled.

  • sampling_rate (Optional[float]) – The sampling rate between [0, 1] indicating how many lines of the CSV are skipped. a sampling rate of 1 means that none are skipped, 0.5 means that half are skipped, and 0 means that all lines are skipped.

  • dialect (Union[str, Dialect]) – A description of the expected format of the CSV, can either be an object of the csv.Dialect class, or one of the strings returned by the :meth:`csv.list_dialects()’ function.

  • encoding (str) – The encoding of the CSV file.

  • is_labeled (bool) – Whether the second column of the CSV represents a label that should be assigned to the item in the first column (True), or if it is another item that should be semantically close to the first (False).

  • convert_to_blob (bool) – Whether uris to local files should be converted to blobs

  • create_point_clouds (bool) – Determines whether from uris to local 3D mesh files should point clouds be sampled.

  • point_cloud_size (int) – Determines the number of points sampled from a mesh to create a point cloud.

size: int | None = None#
sampling_rate: float | None = None#
dialect: str | Dialect = 'auto'#
encoding: str = 'utf-8'#
is_labeled: bool = False#
convert_to_blob: bool = True#
create_point_clouds: bool = True#
point_cloud_size: int = 2048#
finetuner.data.build_finetuning_dataset(data, model, csv_options=None)[source]#

If data has been provided as a CSV file, the given CSV file is parsed and a DocumentArray is created.

Return type:

Union[str, DocumentArray]

finetuner.data.build_encoding_dataset(model, data)[source]#

If data has been provided as a list, a DocumentArray is created from the elements of the list

Return type:

DocumentArray

finetuner.data.load_finetune_data_from_csv(file, task='text-to-text', options=None)[source]#

Takes a CSV file and returns a generator of documents, with each document containing the information from one line of the CSV.

Parameters:
  • file (Union[str, TextIO]) – Either a filepath to or a stream of a CSV file.

  • task (str) – Specifies the modalities of the model that the returned data is to be used for. This data is retrieved using the model name, and does not need to be added to the csv_options argument when calling finetuner.fit()

  • options (Optional[CSVOptions]) – A CSVOptions object.

Return type:

Generator[Document, None, None]

Returns:

A generator of :class:`Document`s. Each document represents one element in the CSV

finetuner.data.check_columns(task, col1, col2)[source]#
Determines the expected modalities of each column using the task argument,

Then checks the given row of the CSV to confirm that it contains valid data

Parameters:
  • task (str) – The task of the model being used.

  • col1 (str) – A single value from the first column of the CSV.

  • col2 (str) – A single value from the second column of the CSV.

Return type:

Tuple[str, str]

Returns:

The expected modality of each column

finetuner.data.create_document(modality, column, convert_to_blob, create_point_clouds, point_cloud_size=2048)[source]#
Checks the expected modality of the value in the given column

and creates a Document with that value

Parameters:
  • modality (str) – The expected modality of the value in the given column

  • column (str) – A single value of a column

  • convert_to_blob (bool) – Whether uris to local image files should be converted to blobs.

  • create_point_clouds (bool) – Whether from uris to local 3D mesh files should point clouds be sampled.

  • point_cloud_size (int) – Determines the number of points sampled from a mesh to create a point cloud.

Return type:

Document