Decoding
The pylaia-htr-decode-ctc
command can be used to predict using a trained PyLaia model. To know more about the options of this command, use pylaia-htr-decode-ctc --help
.
Purpose
This command uses a trained PyLaia model to predict on a dataset.
It requires:
-
the pickled
model
file created during model initialization, -
the weights
*.ckpt
of the trained model created during model training.
Parameters
The full list of parameters is detailed in this section.
General parameters
Parameter | Description | Type | Default |
---|---|---|---|
|
Positional argument. Path to a file mapping characters to integers. The CTC symbol must be mapped to integer 0. |
|
|
|
Positional argument. File containing the names of the images to decode (one image per line). |
|
|
|
Directories containing line images. |
|
|
|
Path to a JSON configuration file |
|
Common parameters
Name | Description | Type | Default |
---|---|---|---|
|
Directory where the model will be saved |
|
|
|
Filename of the model. |
|
|
|
Directory name of the experiment. |
|
|
|
Checkpoint to load. Must be a filepath, a filename, a glob pattern or |
|
|
Data arguments
Name | Description | Type | Default |
---|---|---|---|
|
Batch size. |
|
|
|
Color mode. Must be either |
|
|
|
Number of worker processes created in dataloaders |
|
|
|
Reading order on the input lines: LTR (Left-to-Right) or RTL (Right-to-Left). |
|
|
Decode arguments
Name | Description | Type | Default |
---|---|---|---|
|
Include the associated image ids in the decoding/segmentation output |
|
|
|
String to use as a separator between the image ids and the decoding/segmentation output. |
|
` ` |
|
String to use to join the decoding output. |
|
` ` |
|
Convert the decoding output to symbols instead of symbol index. |
|
|
|
Whether or not to convert spaces. |
|
|
|
Replace the space by this symbol if |
|
|
|
Space symbol to display during decoding. |
|
` ` |
|
Use CTC alignment to estimate character or word segmentation. Should be |
|
`None ` |
|
Temperature parameters used to scale the logits. |
|
|
|
Whether to print line confidence scores. |
|
|
|
Whether to print word confidence scores. |
|
|
|
Whether to decode with an external language model. |
|
|
|
Path to a KenLM or ARPA n-gram language model. |
|
|
|
Weight of the language model. |
|
|
|
Path to a file containing valid tokens. If using a file, the expected format is for tokens mapping to the same index to be on the same line. The |
|
|
|
Path to a lexicon file containing the possible words and corresponding spellings. |
|
|
|
String representing unknown characters. |
|
|
|
String representing the blank/ctc symbol. |
|
|
Logging arguments
Name | Description | Type | Default |
---|---|---|---|
|
Logging format. |
|
|
|
Logging level. Should be in
|
|
|
|
Filepath for the logs file. Can be a filepath or a filename to be created in |
|
|
|
Whether to overwrite the logfile or to append. |
|
|
|
If filename is set, use this to log also to stderr at the given level. |
|
|
Trainer arguments
Pytorch Lightning Trainer
flags can also be set using the --trainer
argument. See the documentation.
This flag is mostly useful to define whether to predict on CPU or GPU.
-
--trainer.gpus 0
to run on CPU, -
--trainer.gpus n
to run onn
GPUs (use with--training.auto_select True
for auto-selection), -
--trainer.gpus -1
to run on all GPUs.
Examples
The prediction can be done using command-line arguments or a YAML configuration file. Note that CLI arguments override the values from the configuration file.
We provide some images to try out our models. They can be found in docs/assets
, on the Gitlab repository. To test the prediction commands, make sure to download them on your end.
mkdir images
wget https://user-images.githubusercontent.com/100838858/219007024-f45433e7-99fd-43b0-bce6-93f63fa72a8f.jpg -P images
wget https://user-images.githubusercontent.com/100838858/219008758-c0097bb4-c55a-4652-ad2e-bba350bee0e4.jpg -P images
Predict using a model from Hugging Face
First, clone a trained model from Hugging Face:
git clone https://huggingface.co/Teklia/pylaia-huginmunin
Some files are stored through Git-LFS. Make sure all files are correctly pulled using the following command, from the cloned folder.
You should see three files:
|
List image names in img_list.txt
:
219007024-f45433e7-99fd-43b0-bce6-93f63fa72a8f
219008758-c0097bb4-c55a-4652-ad2e-bba350bee0e4
Predict with:
pylaia-htr-decode-ctc --common.experiment_dirname pylaia-huginmunin/ \
--common.model_filename pylaia-huginmunin/model \
--img_dir [images] \
pylaia-huginmunin/syms.txt \
img_list.txt
Expected output:
219007024-f45433e7-99fd-43b0-bce6-93f63fa72a8f o g <space> V a l s t a d <space> k a n <space> v i <space> v i s t
219008758-c0097bb4-c55a-4652-ad2e-bba350bee0e4 i k k e <space> g j ø r e <space> R e g n i n g <space> p a a ,
Note that by default, each token is separated by a space, and the space symbol is represented by --decode.input_space
(default: "<space>"
).
Predict with a YAML configuration file
Run the following command to predict a model on CPU using:
pylaia-htr-decode-ctc --config config_decode_model.yaml
With the following configuration file:
syms: pylaia-huginmunin/syms.txt
img_list: img_list.txt
img_dirs:
- images/
common:
experiment_dirname: pylaia-huginmunin
model_filename: pylaia-huginmunin/model
decode:
join_string: ""
convert_spaces: true
trainer:
gpus: 0
Expected output:
219007024-f45433e7-99fd-43b0-bce6-93f63fa72a8f og Valstad kan vi vist
219008758-c0097bb4-c55a-4652-ad2e-bba350bee0e4 ikke gjøre Regning paa,
Note that setting --decode.join_string ""
and --decode.convert_spaces True
will display the text well formatted.
Predict with confidence scores
PyLaia estimate character probability for each timestep. It is possible to print the probability at line or word level.
Line confidence scores
Run the following command to predict with line confidence scores:
pylaia-htr-decode-ctc --config config_decode_model.yaml \
--decode.print_line_confidence_score True
Expected output:
219007024-f45433e7-99fd-43b0-bce6-93f63fa72a8f 0.99 og Valstad kan vi vist
219008758-c0097bb4-c55a-4652-ad2e-bba350bee0e4 0.98 ikke gjøre Regning paa,
Word confidence scores
Run the following command to predict with word confidence scores:
pylaia-htr-decode-ctc --config config_decode_model.yaml \
--decode.print_word_confidence_score True
Expected output:
219007024-f45433e7-99fd-43b0-bce6-93f63fa72a8f ['1.00', '1.00', '1.00', '1.00', '1.00'] og Valstad kan vi vist
219008758-c0097bb4-c55a-4652-ad2e-bba350bee0e4 ['1.00', '0.91', '1.00', '0.99'] ikke gjøre Regning paa,
Temperature scaling
PyLaia tends to output overly confident probabilities. Temperature scaling can be used to improve the reliability of confidence scores. The best temperature can be determined with a grid search algorithm by maximizing the correlation between 1-CER and confidence scores.
Run the following command to predict callibrated word confidence scores with temperature=3.0
pylaia-htr-decode-ctc --config config_decode_model.yaml \
--decode.print_word_confidence_score True \
--decode.temperature 3.0
Expected output:
219007024-f45433e7-99fd-43b0-bce6-93f63fa72a8f ['0.93', '0.85', '0.87', '0.93', '0.85'] og Valstad kan vi vist
219008758-c0097bb4-c55a-4652-ad2e-bba350bee0e4 ['0.93', '0.84', '0.86', '0.83'] ikke gjøre Regning paa,
Predict with a language model
PyLaia supports KenLM and ARPA language models.
Once the n-gram model is built, run the following command to combine it to your PyLaia model:
pylaia-htr-decode-ctc --config config_decode_model_lm.yaml
With the following configuration file:
syms: pylaia-huginmunin/syms.txt
img_list: img_list.txt
img_dirs:
- images/
common:
experiment_dirname: pylaia-huginmunin
model_filename: pylaia-huginmunin/model
decode:
join_string: ""
convert_spaces: true
use_language_model: true
language_model_path: pylaia-huginmunin/language_model.arpa.gz
tokens_path: pylaia-huginmunin/tokens.txt
lexicon_path: pylaia-huginmunin/lexicon.txt
language_model_weight: 1.5
decode.print_line_confidence_score: true
trainer:
gpus: 0
Expected output:
219007024-f45433e7-99fd-43b0-bce6-93f63fa72a8f 0.90 og Valstad kan vi vist
219008758-c0097bb4-c55a-4652-ad2e-bba350bee0e4 0.89 ikke gjøre Regning paa,
Predict with CTC alignement
It is possible to estimate text localization based on CTC alignments with the --decode.segmentation
option. It returns a list texts with their estimated coordinates: (text, x1, y1, x2, y2)
.
Character level
To output character localization, use the --decode.segmentation char
option:
pylaia-htr-decode-ctc --common.experiment_dirname pylaia-huginmunin/ \
--common.model_filename pylaia-huginmunin/model \
--decode.segmentation char \
--img_dir [images] \
pylaia-huginmunin/syms.txt \
img_list.txt
Expected output:
219007024-f45433e7-99fd-43b0-bce6-93f63fa72a8f [('o', 1, 1, 31, 128), ('g', 32, 1, 79, 128), ('<space>', 80, 1, 143, 128), ('V', 144, 1, 167, 128), ('a', 168, 1, 223, 128), ('l', 224, 1, 255, 128), ('s', 256, 1, 279, 128), ('t', 280, 1, 327, 128), ('a', 328, 1, 367, 128), ('d', 368, 1, 407, 128), ('<space>', 408, 1, 496, 128), ('k', 497, 1, 512, 128), ('a', 513, 1, 576, 128), ('n', 577, 1, 624, 128), ('<space>', 625, 1, 712, 128), ('v', 713, 1, 728, 128), ('i', 729, 1, 776, 128), ('<space>', 777, 1, 808, 128), ('v', 809, 1, 824, 128), ('i', 825, 1, 872, 128), ('s', 873, 1, 912, 128), ('t', 913, 1, 944, 128)]
219008758-c0097bb4-c55a-4652-ad2e-bba350bee0e4 [('i', 1, 1, 23, 128), ('k', 24, 1, 71, 128), ('k', 72, 1, 135, 128), ('e', 136, 1, 191, 128), ('<space>', 192, 1, 248, 128), ('g', 249, 1, 264, 128), ('j', 265, 1, 312, 128), ('ø', 313, 1, 336, 128), ('r', 337, 1, 376, 128), ('e', 377, 1, 408, 128), ('<space>', 409, 1, 481, 128), ('R', 482, 1, 497, 128), ('e', 498, 1, 545, 128), ('g', 546, 1, 569, 128), ('n', 570, 1, 601, 128), ('i', 602, 1, 665, 128), ('n', 666, 1, 706, 128), ('g', 707, 1, 762, 128), ('<space>', 763, 1, 794, 128), ('p', 795, 1, 802, 128), ('a', 803, 1, 850, 128), ('a', 851, 1, 890, 128), (',', 891, 1, 914, 128)]
Word level
To output word localization, use the --decode.segmentation word
option:
pylaia-htr-decode-ctc --common.experiment_dirname pylaia-huginmunin/ \
--common.model_filename pylaia-huginmunin/model \
--decode.segmentation word \
--img_dir [images] \
pylaia-huginmunin/syms.txt \
img_list.txt
Expected output:
219007024-f45433e7-99fd-43b0-bce6-93f63fa72a8f [('og', 1, 1, 79, 128), ('<space>', 80, 1, 143, 128), ('Valstad', 144, 1, 407, 128), ('<space>', 408, 1, 496, 128), ('kan', 497, 1, 624, 128), ('<space>', 625, 1, 712, 128), ('vi', 713, 1, 776, 128), ('<space>', 777, 1, 808, 128), ('vist', 809, 1, 944, 128)]
219008758-c0097bb4-c55a-4652-ad2e-bba350bee0e4 [('ikke', 1, 1, 191, 128), ('<space>', 192, 1, 248, 128), ('gjøre', 249, 1, 408, 128), ('<space>', 409, 1, 481, 128), ('Regning', 482, 1, 762, 128), ('<space>', 763, 1, 794, 128), ('paa,', 795, 1, 914, 128)]
Predict on Right-To-Left data
To output word localization, use the --data.reading_order
option:
pylaia-htr-decode-ctc --common.experiment_dirname pylaia-khatt/ \
--common.model_filename pylaia-khatt/model \
--data.reading_order RTL \
--img_dir [images] \
pylaia-khatt/syms.txt \
img_list.txt
Expected output:
text_line_1302 العلماء على فهم هذه الكتابات بالدراسات اللغوية السامية مثل العبرانية، وباللغة العربية التي