finetuner.data module#
- class finetuner.data.CSVOptions(size=None, sampling_rate=None, dialect='auto', encoding='utf-8', is_labeled=False, convert_to_blob=True, create_point_clouds=True, point_cloud_size=2048)[source]#
Bases:
object
Class containing options for reading CSV files
- Parameters:
size (
Optional
[int
]) – The number of rows that will be sampled.sampling_rate (
Optional
[float
]) – The sampling rate between [0, 1] indicating how many lines of the CSV are skipped. a sampling rate of 1 means that none are skipped, 0.5 means that half are skipped, and 0 means that all lines are skipped.dialect (
Union
[str
,Dialect
]) – A description of the expected format of the CSV, can either be an object of thecsv.Dialect
class, or one of the strings returned by the :meth:`csv.list_dialects()’ function.encoding (
str
) – The encoding of the CSV file.is_labeled (
bool
) – Whether the second column of the CSV represents a label that should be assigned to the item in the first column (True), or if it is another item that should be semantically close to the first (False).convert_to_blob (
bool
) – Whether uris to local files should be converted to blobscreate_point_clouds (
bool
) – Determines whether from uris to local 3D mesh files should point clouds be sampled.point_cloud_size (
int
) – Determines the number of points sampled from a mesh to create a point cloud.
- size: int | None = None#
- sampling_rate: float | None = None#
- dialect: str | Dialect = 'auto'#
- encoding: str = 'utf-8'#
- is_labeled: bool = False#
- convert_to_blob: bool = True#
- create_point_clouds: bool = True#
- point_cloud_size: int = 2048#
- class finetuner.data.LabeledCSVParser(file, task, options=None)[source]#
Bases:
_CSVParser
CSV has two columns where the first column is the data, the second column is the label. To use the handler, make sure csv contains two columns and is_labeled=True.
- class finetuner.data.QueryDocumentRelationsParser(file, task, options=None)[source]#
Bases:
_CSVParser
In the case that user do not have explicitly annotated labels, but rather a set of query-document pairs which express that a document is relevant to a query, or form as a text-image pair.
- class finetuner.data.PairwiseScoreParser(file, task, options=None)[source]#
Bases:
_CSVParser
CSV has three columns, column1, column2 and a float value indicates the similarity between column1 and column2.
- class finetuner.data.DataSynthesisParser(file, task, options=None)[source]#
Bases:
_CSVParser
CSV has either one column or one row, each item in the CSV represents a single document so the structure of the CSV file is not important.
- class finetuner.data.CSVContext(model=None, options=None)[source]#
Bases:
object
A CSV context switch class with conditions to parse CSVs into DocumentArray.
- Parameters:
model (
Optional
[str
]) – The model being used, to get model stub and associated task.options (
Optional
[CSVOptions
]) – An instance of :class`CSVOptions`.
- finetuner.data.get_csv_file_context(file, encoding)[source]#
Get csv file context, such as file_ctx, csv dialect and number of columns.
- finetuner.data.get_csv_file_dialect_columns(file, encoding)[source]#
Get csv dialect and number of columns of the csv.
- finetuner.data.build_encoding_dataset(model, data)[source]#
If data has been provided as a list, a
DocumentArray
is created from the elements of the list- Return type:
DocumentArray
- finetuner.data.check_columns(task, col1, col2)[source]#
- Determines the expected modalities of each column using the task argument,
Then checks the given row of the CSV to confirm that it contains valid data
- Parameters:
task (
str
) – The task of the model being used.col1 (
str
) – A single value from the first column of the CSV.col2 (
str
) – A single value from the second column of the CSV.
- Return type:
Tuple
[str
,str
]- Returns:
The expected modality of each column
- finetuner.data.create_document(modality, column, convert_to_blob, create_point_clouds, point_cloud_size=2048)[source]#
- Checks the expected modality of the value in the given column
and creates a
Document
with that value
- Parameters:
modality (
str
) – The expected modality of the value in the given columncolumn (
str
) – A single value of a columnconvert_to_blob (
bool
) – Whether uris to local image files should be converted to blobs.create_point_clouds (
bool
) – Whether from uris to local 3D mesh files should point clouds be sampled.point_cloud_size (
int
) – Determines the number of points sampled from a mesh to create a point cloud.
- Return type:
Document