Dataset download
The teklia-qwen dataset download command requires the following arguments:
-
the path to an extracted dataset,
-
a maximum width, provided through the optional
--max-widthargument, images with a width higher than this value will be downsized, -
a maximum height, provided through the optional
--max-heightargument, images with a height higher than this value will be downsized.
The dataset is generated in JSONL format, one file per set. Each row matches one example of the training set. Keys are:
- system: the system prompt,
- query: the prompt (images are added using the <imageX> tokens),
- response: the expected response of the model,
- images: the paths to the images given as input to the model.