ATR Data generator

Extract datasets from Arkindex, a platform developed by Teklia to train Automatic Text Recognition pipelines.

After installing this Python package, you will gain access to the atr-data-generator command. To learn more about it and its subcommands, run atr-data-generator --help.

Subcommands use a YAML configuration file, provided via the --config parameter. More details about the structure of this configuration file are available in the respective section. Every run will export both a config.yaml file and a param.json file that can be used to reproduce the data generation.

See the Development section to learn how to contribute to this project.