Prediction

Description

Use the teklia-dan predict command to apply a trained DAN model on an image.

Parameter Description Type Default

Parameter	Description	Type	Default
`--image-dir`	Path to the folder where the images to predict are stored. Must not be provided with `--image`.	`pathlib.Path`
`--image-extension`	The extension of the images in the folder. Ignored if `--image-dir` is not provided.	`str`	.jpg
`--model`	Path to the directory containing the model, the YAML parameters file and the charset file to use for prediction.	`pathlib.Path`
`--font`	Path to the font file to use for the GIF of the attention map.	`pathlib.Path`	`fonts/LinuxLibertine.ttf`
`--maximum-font-size`	Maximum font size to use for the GIF of the attention map.	`int`	32
`--output`	Path to the output folder. Results will be saved in this directory.	`pathlib.Path`
`--tokens`	Path to a yaml file containing a mapping between starting tokens and end tokens. Needed for entities.	`pathlib.Path`
`--temperature`	Temperature scaling scalar parameter.	`float`	`1.0`
`--confidence-score`	Whether to return confidence scores.	`bool`	`False`
`--confidence-score-levels`	Level to return confidence scores. Should be any combination of `["line", "word", "char", "ner"]`.	`str`
`--attention-map`	Whether to plot attention maps.	`bool`	`False`
`--attention-map-scale`	Image scaling factor before creating the GIF.	`float`	`0.5`
`--alpha-factor`	Alpha factor that controls how much the attention map is shown to the user during prediction.	`float`	`0.9`
`--color-map`	A matplotlib colormap to use for the attention maps.	`str`	`nipy_spectral`
`--attention-map-level`	Level to plot the attention maps. Should be in `["line", "word", "char", "ner"]`.	`str`	`"line"`
`--attention-from-binarization`	Whether to combine the attention map and the binarized image to extract polygons.	`bool`	`False`
`--predict-objects`	Whether to return polygons coordinates.	`bool`	`False`
`--max-object-height`	Maximum height for predicted objects. If set, grid search segmentation will be applied and width will be normalized to element width.	`int`
`--word-separators`	List of word separators.	`list`	`[" ", "\n"]`
`--line-separators`	List of line separators.	`list`	`["\n"]`
`--gpu-device`	Use a specific GPU if available.	`int`
`--batch-size`	Size of the batches for prediction.	`int`	`1`
`--start-token`	Use a specific starting token at the beginning of the prediction. Useful when making predictions on different single pages.	`str`
`--use-language-model`	Whether to use an explicit language model to rescore text hypotheses.	`bool`	`False`
`--compile-model`	Whether to compile the model. Recommended to speed up inference.	`bool`	`False`
`--dynamic-mode`	Whether to use the dynamic mode during model compilation. Recommended for prediction on images of variable size.	`bool`	`False`

--image-dir

Path to the folder where the images to predict are stored. Must not be provided with --image.

pathlib.Path

--image-extension

The extension of the images in the folder. Ignored if --image-dir is not provided.

str

.jpg

--model

Path to the directory containing the model, the YAML parameters file and the charset file to use for prediction.

pathlib.Path

--font

Path to the font file to use for the GIF of the attention map.

pathlib.Path

fonts/LinuxLibertine.ttf

--maximum-font-size

Maximum font size to use for the GIF of the attention map.

int

--output

Path to the output folder. Results will be saved in this directory.

pathlib.Path

--tokens

Path to a yaml file containing a mapping between starting tokens and end tokens. Needed for entities.

pathlib.Path

--temperature

Temperature scaling scalar parameter.

float

1.0

--confidence-score

Whether to return confidence scores.

bool

False

--confidence-score-levels

Level to return confidence scores. Should be any combination of ["line", "word", "char", "ner"].

str

--attention-map

Whether to plot attention maps.

bool

False

--attention-map-scale

Image scaling factor before creating the GIF.

float

0.5

--alpha-factor

Alpha factor that controls how much the attention map is shown to the user during prediction.

float

0.9

--color-map

A matplotlib colormap to use for the attention maps.

str

nipy_spectral

--attention-map-level

Level to plot the attention maps. Should be in ["line", "word", "char", "ner"].

str

"line"

--attention-from-binarization

Whether to combine the attention map and the binarized image to extract polygons.

bool

False

--predict-objects

Whether to return polygons coordinates.

bool

False

--max-object-height

Maximum height for predicted objects. If set, grid search segmentation will be applied and width will be normalized to element width.

int

--word-separators

List of word separators.

list

[" ", "\n"]

--line-separators

List of line separators.

list

["\n"]

--gpu-device

Use a specific GPU if available.

int

--batch-size

Size of the batches for prediction.

int

1

--start-token

Use a specific starting token at the beginning of the prediction. Useful when making predictions on different single pages.

str

--use-language-model

Whether to use an explicit language model to rescore text hypotheses.

bool

False

--compile-model

Whether to compile the model. Recommended to speed up inference.

bool

False

--dynamic-mode

Whether to use the dynamic mode during model compilation. Recommended for prediction on images of variable size.

bool

False

The --model argument expects a directory with the following files:

a model.pt file,
a charset.pkl file,
a parameters.yml file corresponding to the inference_parameters.yml file generated during training.

The default font we use in --font is Linux Libertine

Examples

Predict with confidence scores

To run a prediction with confidence scores, run this command:

teklia-dan predict \
    --image-dir images/ \
    --model models \
    --output predict/ \
    --confidence-score

It will create the following JSON file named after the image in the predict folder:

{
  "text": "\u24c4Bousquet \u24bbElisabeth \u24bdfemme du pr\u00e9c\u00e9dent \u24b8Femme mari\u00e9e\n\u24c4Autrau \u24bbPierre \u24bdfils \u24b8Gar\u00e7on\n\u24c4Tersypore \u24bbRose \u24bden nourrice \u24b8Fille \u24c1enfant trouv\u00e9\n\u24c4Fournier \u24bbPauline \u24bdidem \u24b8Fille\n\u24c5Regibeaud \u24bbJoseph \u24c2propri\u00e9taire \u24b8Homme mari\u00e9\n\u24c4Femme Reyibeaud \u24bbVictoire \u24bdsa femme \u24b8Femme mari\u00e9e\n\u24c4Regibeaud \u24bbMarius \u24b8Gar\u00e7on\n\u24c4Regibeaud \u24bbAlphonse \u24b8Gar\u00e7on\n\u24c4Regibeaud \u24bbEl\u00e9onore \u24b8Fille\n\u24c5Rouvier \u24bbJean \u24c2fournier \u24b8Homme mari\u00e9\n\u24c4Blanc \u24bbRosalie \u24bdsa femme \u24b8Femme mari\u00e9e\n\u24c4Rouvier \u24bbVirginie \u24b8Fille\n\u24c4Chauvier \u24bbFille de la femme Rouvier \u24bbBabet \u24b8Fille\n\u24c4Rouvier \u24bbBabet \u24b8Veuve\n\u24c4Rue de la maison \u24bbFoudel \u24b8Femme mari\u00e9e\n\u24c4Ruille \u24bbMagdeleine \u24bdfemme Fabre \u24b8Femme mari\u00e9e\n\u24c4Fabre \u24bbPholastique \u24b8Fille\n\u24c5Maurel \u24bbSimphorose \u24c2veuve regibeaud \u24b8Veuve\n\u24c4Reyibeaud \u24bbFran\u00e7ois \u24bdfils \u24b8Gar\u00e7on\n\u24c5Giraud \u24bbJoseph \u24c2propri\u00e9taire \u24b8Homme mari\u00e9\n\u24c4Maudier \u24bbCatherine \u24bdsa femme \u24b8Femme mari\u00e9e\n\u24c4Giraud \u24bbMarie \u24b8Fille\n\u24c4Giraud \u24bbBaptistine \u24b8Fille\n\u24c4Beuf \u24bbTh\u00e9r\u00e8se \u24bdfemme pelin \u24b8Femme mari\u00e9e\n\u24c4Pelin \u24bbMarie \u24bdfille \u24b8Fille\n\u24c4Pelin \u24bbMarius \u24b8Gar\u00e7on\n\u24c4Herssodau \u24bbH\u00e9laire \u24c2ouv nourrice \u24b8Gar\u00e7on",
  "confidences": {
    "total": 0.97
  },
  "language_model": {}
}

Predict with confidence scores and line-level attention maps

To run a prediction with confidence scores and plot line-level attention maps, run this command:

teklia-dan predict \
    --image-dir images/ \
    --model models \
    --output predict/ \
    --confidence-score \
    --attention-map

It will create the following JSON file named after the image and a GIF showing a line-level attention map in the predict folder:

{
  "text": "\u24c4Bousquet \u24bbElisabeth \u24bdfemme du pr\u00e9c\u00e9dent \u24b8Femme mari\u00e9e\n\u24c4Autrau \u24bbPierre \u24bdfils \u24b8Gar\u00e7on\n\u24c4Tersypore \u24bbRose \u24bden nourrice \u24b8Fille \u24c1enfant trouv\u00e9\n\u24c4Fournier \u24bbPauline \u24bdidem \u24b8Fille\n\u24c5Regibeaud \u24bbJoseph \u24c2propri\u00e9taire \u24b8Homme mari\u00e9\n\u24c4Femme Reyibeaud \u24bbVictoire \u24bdsa femme \u24b8Femme mari\u00e9e\n\u24c4Regibeaud \u24bbMarius \u24b8Gar\u00e7on\n\u24c4Regibeaud \u24bbAlphonse \u24b8Gar\u00e7on\n\u24c4Regibeaud \u24bbEl\u00e9onore \u24b8Fille\n\u24c5Rouvier \u24bbJean \u24c2fournier \u24b8Homme mari\u00e9\n\u24c4Blanc \u24bbRosalie \u24bdsa femme \u24b8Femme mari\u00e9e\n\u24c4Rouvier \u24bbVirginie \u24b8Fille\n\u24c4Chauvier \u24bbFille de la femme Rouvier \u24bbBabet \u24b8Fille\n\u24c4Rouvier \u24bbBabet \u24b8Veuve\n\u24c4Rue de la maison \u24bbFoudel \u24b8Femme mari\u00e9e\n\u24c4Ruille \u24bbMagdeleine \u24bdfemme Fabre \u24b8Femme mari\u00e9e\n\u24c4Fabre \u24bbPholastique \u24b8Fille\n\u24c5Maurel \u24bbSimphorose \u24c2veuve regibeaud \u24b8Veuve\n\u24c4Reyibeaud \u24bbFran\u00e7ois \u24bdfils \u24b8Gar\u00e7on\n\u24c5Giraud \u24bbJoseph \u24c2propri\u00e9taire \u24b8Homme mari\u00e9\n\u24c4Maudier \u24bbCatherine \u24bdsa femme \u24b8Femme mari\u00e9e\n\u24c4Giraud \u24bbMarie \u24b8Fille\n\u24c4Giraud \u24bbBaptistine \u24b8Fille\n\u24c4Beuf \u24bbTh\u00e9r\u00e8se \u24bdfemme pelin \u24b8Femme mari\u00e9e\n\u24c4Pelin \u24bbMarie \u24bdfille \u24b8Fille\n\u24c4Pelin \u24bbMarius \u24b8Gar\u00e7on\n\u24c4Herssodau \u24bbH\u00e9laire \u24c2ouv nourrice \u24b8Gar\u00e7on",
  "confidences": {
    "total": 0.97
  },
  "language_model": {},
  "attention_gif": "predict/example_line.gif"
}

Predict with confidence scores and word-level attention maps

To run a prediction with confidence scores and plot word-level attention maps, run this command:

teklia-dan predict \
    --image-dir images/ \
    --model models \
    --output predict/ \
    --confidence-score \
    --attention-map \
    --attention-map-level word \

It will create the following JSON file named after the image and a GIF showing a word-level attention map in the predict folder:

{
  "text": "\u24c4Bousquet \u24bbElisabeth \u24bdfemme du pr\u00e9c\u00e9dent \u24b8Femme mari\u00e9e\n\u24c4Autrau \u24bbPierre \u24bdfils \u24b8Gar\u00e7on\n\u24c4Tersypore \u24bbRose \u24bden nourrice \u24b8Fille \u24c1enfant trouv\u00e9\n\u24c4Fournier \u24bbPauline \u24bdidem \u24b8Fille\n\u24c5Regibeaud \u24bbJoseph \u24c2propri\u00e9taire \u24b8Homme mari\u00e9\n\u24c4Femme Reyibeaud \u24bbVictoire \u24bdsa femme \u24b8Femme mari\u00e9e\n\u24c4Regibeaud \u24bbMarius \u24b8Gar\u00e7on\n\u24c4Regibeaud \u24bbAlphonse \u24b8Gar\u00e7on\n\u24c4Regibeaud \u24bbEl\u00e9onore \u24b8Fille\n\u24c5Rouvier \u24bbJean \u24c2fournier \u24b8Homme mari\u00e9\n\u24c4Blanc \u24bbRosalie \u24bdsa femme \u24b8Femme mari\u00e9e\n\u24c4Rouvier \u24bbVirginie \u24b8Fille\n\u24c4Chauvier \u24bbFille de la femme Rouvier \u24bbBabet \u24b8Fille\n\u24c4Rouvier \u24bbBabet \u24b8Veuve\n\u24c4Rue de la maison \u24bbFoudel \u24b8Femme mari\u00e9e\n\u24c4Ruille \u24bbMagdeleine \u24bdfemme Fabre \u24b8Femme mari\u00e9e\n\u24c4Fabre \u24bbPholastique \u24b8Fille\n\u24c5Maurel \u24bbSimphorose \u24c2veuve regibeaud \u24b8Veuve\n\u24c4Reyibeaud \u24bbFran\u00e7ois \u24bdfils \u24b8Gar\u00e7on\n\u24c5Giraud \u24bbJoseph \u24c2propri\u00e9taire \u24b8Homme mari\u00e9\n\u24c4Maudier \u24bbCatherine \u24bdsa femme \u24b8Femme mari\u00e9e\n\u24c4Giraud \u24bbMarie \u24b8Fille\n\u24c4Giraud \u24bbBaptistine \u24b8Fille\n\u24c4Beuf \u24bbTh\u00e9r\u00e8se \u24bdfemme pelin \u24b8Femme mari\u00e9e\n\u24c4Pelin \u24bbMarie \u24bdfille \u24b8Fille\n\u24c4Pelin \u24bbMarius \u24b8Gar\u00e7on\n\u24c4Herssodau \u24bbH\u00e9laire \u24c2ouv nourrice \u24b8Gar\u00e7on",
  "confidences": {
    "total": 0.97
  },
  "language_model": {},
  "attention_gif": "predict/example_word.gif"
}

Predict with line-level attention maps and extract polygons

To run a prediction, plot line-level attention maps, and extract polygons, run this command:

teklia-dan predict \
    --image-dir images/ \
    --model models \
    --output predict/ \
    --attention-map \
    --predict-objects

It will create the following JSON file named after the image and a GIF showing a line-level attention map in the predict folder:

{
  "text": "\u24c4Bousquet \u24bbElisabeth \u24bdfemme du pr\u00e9c\u00e9dent \u24b8Femme mari\u00e9e\n\u24c4Autrau \u24bbPierre \u24bdfils \u24b8Gar\u00e7on\n\u24c4Tersypore \u24bbRose \u24bden nourrice \u24b8Fille \u24c1enfant trouv\u00e9\n\u24c4Fournier \u24bbPauline \u24bdidem \u24b8Fille\n\u24c5Regibeaud \u24bbJoseph \u24c2propri\u00e9taire \u24b8Homme mari\u00e9\n\u24c4Femme Reyibeaud \u24bbVictoire \u24bdsa femme \u24b8Femme mari\u00e9e\n\u24c4Regibeaud \u24bbMarius \u24b8Gar\u00e7on\n\u24c4Regibeaud \u24bbAlphonse \u24b8Gar\u00e7on\n\u24c4Regibeaud \u24bbEl\u00e9onore \u24b8Fille\n\u24c5Rouvier \u24bbJean \u24c2fournier \u24b8Homme mari\u00e9\n\u24c4Blanc \u24bbRosalie \u24bdsa femme \u24b8Femme mari\u00e9e\n\u24c4Rouvier \u24bbVirginie \u24b8Fille\n\u24c4Chauvier \u24bbFille de la femme Rouvier \u24bbBabet \u24b8Fille\n\u24c4Rouvier \u24bbBabet \u24b8Veuve\n\u24c4Rue de la maison \u24bbFoudel \u24b8Femme mari\u00e9e\n\u24c4Ruille \u24bbMagdeleine \u24bdfemme Fabre \u24b8Femme mari\u00e9e\n\u24c4Fabre \u24bbPholastique \u24b8Fille\n\u24c5Maurel \u24bbSimphorose \u24c2veuve regibeaud \u24b8Veuve\n\u24c4Reyibeaud \u24bbFran\u00e7ois \u24bdfils \u24b8Gar\u00e7on\n\u24c5Giraud \u24bbJoseph \u24c2propri\u00e9taire \u24b8Homme mari\u00e9\n\u24c4Maudier \u24bbCatherine \u24bdsa femme \u24b8Femme mari\u00e9e\n\u24c4Giraud \u24bbMarie \u24b8Fille\n\u24c4Giraud \u24bbBaptistine \u24b8Fille\n\u24c4Beuf \u24bbTh\u00e9r\u00e8se \u24bdfemme pelin \u24b8Femme mari\u00e9e\n\u24c4Pelin \u24bbMarie \u24bdfille \u24b8Fille\n\u24c4Pelin \u24bbMarius \u24b8Gar\u00e7on\n\u24c4Herssodau \u24bbH\u00e9laire \u24c2ouv nourrice \u24b8Gar\u00e7on",
  "confidences": {
    "total": 0.97
  },
  "language_model": {},
  "objects": [
    {
      "confidence": 0.62,
      "polygon": [
        [
          303,
          324
        ],
        [
          995,
          324
        ],
        [
          995,
          402
        ],
        [
          303,
          402
        ]
      ],
      "text": "\u24c4Bousquet \u24bbElisabeth \u24bdfemme du pr\u00e9c\u00e9dent \u24b8Femme mari\u00e9e",
      "text_confidence": 0.98
    },
    ...
  ],
  "attention_gif": "predict/example_line.gif"
}

Predict with an external n-gram language model

This example assumes that you have already trained a language model.

the weight parameter defines how much weight to give to the language model. It should be set carefully (usually between 0.5 and 2.0) as it will affect the quality of the predictions.
linebreaks are treated as spaces by language models, as a result predictions will not include linebreaks.

Language model at character level

Update the parameters.yml file obtained during DAN training.

parameters:
  ...
  language_model:
    model: my_dataset/language_model/model_characters.arpa
    lexicon: my_dataset/language_model/lexicon_characters.txt
    tokens: my_dataset/language_model/tokens.txt
    weight: 0.5

Then, run this command:

teklia-dan predict \
    --image-dir images/ \
    --model models \
    --use-language-model \
    --output predict_char_lm/

It will create the following JSON file named after the image in the predict_char_lm folder:

{
  "text": "etc., some jeg netop idag\nholder Vask paa.\nLeien af Skj\u00f8rterne\nbestad i at jeg kj\u00f8bte\net Forkl\u00e6de til hver\naf de to Piger, some\nhavde laant os dem.\nResten var Vask af Hardan-\ngerskj\u00f8rter og et Forkl\u00e6de,\nsamt Fragt paa det Gods\n(N\u00f8i) some man sendte\nmig ubet\u00e6lt.\nIdag fik jeg hyggeligt\nFrimarkebrev fra Fosvold\nMed Hilsen\nDeres\nHulda Garborg",
  "language_model": {
    "text": "eet., some jeg netop idag holder Vask paa. Leien af Skj\u00f8rterne bestad i at jeg kj\u00f8bte et Forkl\u00e6de til hver af de to Piger, some havde laant os dem. Resten var Vask af Hardan- gerskj\u00f8rter og et Forkl\u00e6de, samt Fragt paa det Gods (T\u00f8i) some man sendte mig ubet\u00e6lt. Idag fik jeg hyggeligt Frimarkebrev fra Fosvold Med Hilsen Deres Hulda Garborg",
    "confidence": 0.9
  }
}

Language model at subword level

Update the parameters.yml file obtained during DAN training.

parameters:
  ...
  language_model:
    model: my_dataset/language_model/model_subwords.arpa
    lexicon: my_dataset/language_model/lexicon_subwords.txt
    tokens: my_dataset/language_model/tokens.txt
    weight: 0.5

Then, run this command:

teklia-dan predict \
    --image-dir images/ \
    --model models \
    --use-language-model \
    --output predict_subword_lm

It will create the following JSON file named after the image in the predict_subword_lm folder:

{
  "text": "etc., some jeg netop idag\nholder Vask paa.\nLeien af Skj\u00f8rterne\nbestad i at jeg kj\u00f8bte\net Forkl\u00e6de til hver\naf de to Piger, some\nhavde laant os dem.\nResten var Vask af Hardan-\ngerskj\u00f8rter og et Forkl\u00e6de,\nsamt Fragt paa det Gods\n(N\u00f8i) some man sendte\nmig ubet\u00e6lt.\nIdag fik jeg hyggeligt\nFrimarkebrev fra Fosvold\nMed Hilsen\nDeres\nHulda Garborg",
  "language_model": {
    "text": "eet., some jeg netop idag holder Vask paa. Leien af Skj\u00f8rterne bestad i at jeg kj\u00f8bte et Forkl\u00e6de til hver af de to Piger, some havde laant os dem. Resten var Vask af Hardan- gerskj\u00f8rter og et Forkl\u00e6de, samt Fragt paa det Gods (T\u00f8i) some man sendte mig ubet\u00e6lt. Idag fik jeg hyggeligt Frim\u00e6rkebrev fra Fosvold Med Hilsen Deres Hulda Garborg",
    "confidence": 0.84
  }
}

Language model at word level

Update the parameters.yml file obtained during DAN training.

parameters:
  ...
  language_model:
    model: my_dataset/language_model/model_words.arpa
    lexicon: my_dataset/language_model/lexicon_words.txt
    tokens: my_dataset/language_model/tokens.txt
    weight: 0.5

Then, run this command:

teklia-dan predict \
    --image-dir images/ \
    --model models \
    --use-language-model \
    --output predict_word_lm/

It will create the following JSON file named after the image in the predict_word_lm folder:

{
  "text": "etc., some jeg netop idag\nholder Vask paa.\nLeien af Skj\u00f8rterne\nbestad i at jeg kj\u00f8bte\net Forkl\u00e6de til hver\naf de to Piger, some\nhavde laant os dem.\nResten var Vask af Hardan-\ngerskj\u00f8rter og et Forkl\u00e6de,\nsamt Fragt paa det Gods\n(N\u00f8i) some man sendte\nmig ubet\u00e6lt.\nIdag fik jeg hyggeligt\nFrimarkebrev fra Fosvold\nMed Hilsen\nDeres\nHulda Garborg",
  "language_model": {
    "text": "etc., some jeg netop idag holder Vask paa. Leien af Skj\u00f8rterne bestad i at jeg kj\u00f8bte et Forkl\u00e6de til hver af de to Piger, some havde laant os dem. Resten var Vask af Hardan- gerskj\u00f8rter og et Forkl\u00e6de, samt Fragt paa det Gods (T\u00f8i) some man sendte mig ubetalt. Idag fik jeg hyggeligt Frim\u00e6rkebrev fra Fosvold Med Hilsen Deres Hulda Garborg",
    "confidence": 0.77
  }
}

Speed up prediction with model compilation

To speed up prediction, it is recommended to compile models using torch.compile.

Run this command to use this option:

teklia-dan predict \
    --image-dir images/ \
    --model models \
    --output predict/ \
    --compile-model

When predicting on images of variable size, it is recommended to enable the dynamic mode:

teklia-dan predict \
    --image-dir images/ \
    --model models \
    --output predict/ \
    --compile-model \
    --dynamic-mode