Training

Description

Use the teklia-layout-reader train to train your own model.

Parameter Description Type Default

--config

Path to the training configuration (YAML format).

pathlib.Path

Expected configuration

Many parameters can be adjusted through the configuration file.

Basic parameters

Parameter Description Type Recommended value

model_dir

str

Path to the local or Hugging Face model to fine-tune

hantian/layoutreader or any other specialized model.

dataset_dir

Path

Path to the local dataset

The dataset extracted following this recipe.

sort_method

str

Sorting method for zone order initialization.

Use "sortxy_by_column" for documents with many columns, otherwise "sortxy".

sort_ratio

float

Fraction of training documents whose zones will be shuffled. The remaining documents will follow sort_method.

Use 0.5 by default.

with_classes

bool

Whether to include zone class labels

Should be enabled (true) by default. Disable only if you don’t have access to zone classes.

with_separators

bool

Whether to include horizontal and vertical separators as additional inputs

Should be enabled (true) by default. Disable only if there are no separators, or if you think they are not helpful to predict the reading order.

per_device_train_batch_size

int

Training batch size per GPU

Use 4 by default and adjust depending on your GPU spec.

per_device_eval_batch_size

int

Evaluation batch size per GPU

Use 4 by default and adjust depending on your GPU spec.

SFTTrainer parameters

Parameter Description Type Recommended value

output_dir

str

Output folder where checkpoints will be saved

per_device_train_batch_size

int

Training batch size per GPU

Use 4 by default and adjust depending on your GPU capabilities

per_device_eval_batch_size

int

Evaluation batch size per GPU

Use 4 by default and adjust depending on your GPU capabilities

For a more advanced usage, please have a look at the full configuration file and check out the SFTTrainer class page.

Examples

Train a LayoutReader model

This command will train a model on a given dataset.

teklia-layout-reader train --config my_awesome_config.yaml