Dataset extraction
The extract
subcommand is used to extract data from Arkindex. This will create:
-
images/
, a folder with the images that need to be transcribed, -
labels.json
, a JSON file where each image is linked to its transcription.
The full command is:
atr-data-generator extract \
--config path/to/configuration.yaml \
--database-path path/to/db.sqlite
Both these arguments are required:
-
--config
, the path to the configuration file, -
--database-path
, the path to the Arkindex SQLite export of the corpus.
More details about the configuration file needed in the Dataset extraction section.