Dataset download

The teklia-qwen dataset download command requires the following arguments:

  • the path to an extracted dataset,

  • a maximum width, provided through the optional --max-width argument, images with a width higher than this value will be downsized,

  • a maximum height, provided through the optional --max-height argument, images with a height higher than this value will be downsized.

The dataset is generated in JSONL format, one file per set. Each row matches one example of the training set. Keys are: - system: the system prompt, - query: the prompt (images are added using the <imageX> tokens), - response: the expected response of the model, - images: the paths to the images given as input to the model.

Examples

Below is a command to download a dataset, limiting the height and the width to 2000 pixels:

teklia-qwen dataset download dataset/ \
    --max-width 2000 \
    --max-height 2000