finetuner.toydata module

finetuner.toydata.generate_qa(num_total=481, num_neg=0, pos_value=1, neg_value=- 1, is_testset=None)[source]

Get a generator of QA data with synthetic negative matches.

Each document in the array will have the text saved as text attribute, and matches will have the label saved as a tag under tags['finetuner__label'].

Parameters
  • num_total (int) – the total number of documents to return

  • num_neg (int) – the number of negative matches per document

  • pos_value (int) – the label value of the positive matches

  • neg_value (int) – the label value of the negative matches

  • max_seq_len – the maximum sequence length of each text.

  • is_testset (Optional[bool]) – If to generate test data, if set to None, will all data return

Return type

DocumentArray

finetuner.toydata.generate_fashion(num_total=60000, upsampling=1, channels=0, channel_axis=- 1, is_testset=False, download_proxy=None)[source]

Get a Generator of fashion-mnist Documents.

Each document in the array will have the image content saved as blob, and the label saved as a tag under tags['finetuner__label'].

Parameters
  • num_total (int) – the total number of documents to return

  • upsampling (int) – the rescale factor, must be integer and >=1. It rescales the image into a bigger image. For example, upsampling=2 gives 56 x 56 images.

  • channels (int) – fashion-mnist data is gray-scale data, it does not have channel. One can set channel to 1 or 3 to simulate real grayscale or rgb imaga

  • channel_axis (int) – The axis for channels, e.g. for pytorch we expect B*C*W*H, channel axis should be 1.

  • is_testset (bool) – If to generate test data

Return type

DocumentArray