finetuner.data module#

class finetuner.data.CSVOptions(size=None, sampling_rate=None, dialect='auto', encoding='utf-8', is_labeled=False, convert_to_blob=True)[source]#

Bases: object

Class containing options for reading CSV files

Parameters
  • size (Optional[int]) – The number of rows that will be sampled.

  • sampling_rate (Optional[float]) – The sampling rate between [0, 1] indicating how many lines of the CSV are skipped. a sampling rate of 1 means that none are skipped, 0.5 means that half are skipped, and 0 means that all lines are skipped.

  • dialect (Union[str, Dialect]) – A description of the expected format of the CSV, can either be an object of the csv.Dialect class, or one of the strings returned by the :meth:`csv.list_dialects()’ function.

  • encoding (str) – The encoding of the CSV file.

  • is_labeled (bool) – Whether the second column of the CSV represents a label that should be assigned to the item in the first column (True), or if it is another item that should be semantically close to the first (False).

  • convert_to_blob (bool) – Whether uris to local files should be converted to blobs

size: Optional[int] = None#
sampling_rate: Optional[float] = None#
dialect: Union[str, csv.Dialect] = 'auto'#
encoding: str = 'utf-8'#
is_labeled: bool = False#
convert_to_blob: bool = True#
finetuner.data.build_finetuning_dataset(data, model, csv_options=None)[source]#

If data has been provided as a CSV file, the given CSV file is parsed and a DocumentArray is created.

Return type

Union[str, DocumentArray]

finetuner.data.build_encoding_dataset(model, data)[source]#

If data has been provided as a list, a DocumentArray is created from the elements of the list

Return type

DocumentArray

finetuner.data.load_finetune_data_from_csv(file, task='text-to-text', options=None)[source]#

Takes a CSV file and returns a generator of documents, with each document containing the information from one line of the CSV.

Parameters
  • file (Union[str, TextIO]) – Either a filepath to or a stream of a CSV file.

  • task (str) – Specifies the modalities of the model that the returned data is to be used for. This data is retrieved using the model name, and does not need to be added to the csv_options argument when calling finetuner.fit()

  • options (Optional[CSVOptions]) – A CSVOptions object.

Return type

Generator[Document, None, None]

Returns

A generator of :class:`Document`s. Each document represents one element in the CSV

finetuner.data.check_columns(task, col1, col2)[source]#
Determines the expected modalities of each column using the task argument,

Then checks the given row of the CSV to confirm that it contains valid data

Parameters
  • task (str) – The task of the model being used.

  • col1 (str) – A single value from the first column of the CSV.

  • col2 (str) – A single value from the second column of the CSV.

Return type

Tuple[str, str]

Returns

The expected modality of each column

finetuner.data.create_document(modality, column, convert_to_blob)[source]#
Checks the expected modality of the value in the given column

and creates a Document with that value

Parameters
  • modality (str) – The expected modality of the value in the given column

  • column (str) – A single value of a column

Convert_to_blob

Whether uris to local files should be converted to blobs

Return type

Document