Publish PyLaia dataset to HuggingFace
To publish your PyLaia dataset on the HuggingFace Datasets Hub, use the convert subcommand to create Parquet files first.
atr convert \
--folder <path/to/folder> \ # where your txt files (train.txt, ...) are stored
--images <path/to/images> \ # where the images are stored
--output <path/to/output> \ # folder where the files will be saved
--image-ext <ext> # Optional, in case your image are not in .jpg format
This command will generate three files, one per split. This publication method does not allow the val name for the validation step.
Instead, we use validation.parquet for this split.
Then you can publish these three files in a data folder in your dataset repository.
The final structure of the dataset repository on Hugging Face should be:
├── data │ ├── train.parquet │ ├── validation.parquet │ └── test.parquet ├── .gitattributes └── README.md
See an example.