Evaluation

Description

Use the teklia-dan evaluate command to evaluate a trained DAN model.

To evaluate DAN on your dataset:

  1. Create a JSON configuration file. You can base the configuration file off the training one. Refer to the dedicated page for a description of parameters.

  2. Run teklia-dan evaluate --config path/to/your/config.json.

This will, for each evaluated split:

  1. Create a YAML file with the evaluation results in the results subfolder of the training.output_folder indicated in your configuration.

  2. Print in the console a metrics Markdown table (see HTR example below).

  3. Print in the console a Nerval metrics Markdown table, if the dataset.tokens parameter in your configuration is defined (see HTR and NER example below).

  4. Print in the console the 5 worst predictions (see examples below).

The display of the worst predictions does not support batch evaluation. If the training.data.batch_size parameter is not equal to 1, then the WER displayed is the WER of the whole batch and not just the image.
Parameter Description Type Default

--config

Path to the configuration file.

pathlib.Path

--nerval-threshold

Distance threshold for the match between gold and predicted entity during Nerval evaluation. 0 would impose perfect matches, 1 would allow completely different strings to be considered as a match.

float

0

--output-json

Where to save evaluation results in JSON format.

pathlib.Path

None

--sets

Sets to evaluate. Defaults to train, dev, test.

list[str]

["train", "dev", "test"]

Examples

HTR evaluation

#### DAN evaluation

| Split | CER (HTR-NER) | CER (HTR) | WER (HTR-NER) | WER (HTR) | WER (HTR no punct) |
| :---: | :-----------: | :-------: | :-----------: | :-------: | :----------------: |
| train |       x       |     x     |       x       |     x     |         x          |
|  dev  |       x       |     x     |       x       |     x     |         x          |
| test  |       x       |     x     |       x       |     x     |         x          |

#### 5 worst prediction(s)

|   Image name   |  WER  | Alignment between ground truth - prediction |
| :------------: | :---: | :-----------------------------------------: |
| <image_id>.png |   x   |                      x                      |
|                |       |                      |                      |
|                |       |                      x                      |

HTR and NER evaluation

#### DAN evaluation

| Split | CER (HTR-NER) | CER (HTR) | WER (HTR-NER) | WER (HTR) | WER (HTR no punct) |  NER  |
| :---: | :-----------: | :-------: | :-----------: | :-------: | :----------------: | :---: |
| train |       x       |     x     |       x       |     x     |         x          |   x   |
|  dev  |       x       |     x     |       x       |     x     |         x          |   x   |
| test  |       x       |     x     |       x       |     x     |         x          |   x   |

#### Nerval evaluation

##### train

|   tag   | predicted | matched | Precision | Recall |  F1   | Support |
| :-----: | :-------: | :-----: | :-------: | :----: | :---: | :-----: |
| Surname |     x     |    x    |     x     |   x    |   x   |    x    |
|   All   |     x     |    x    |     x     |   x    |   x   |    x    |

##### dev

|   tag   | predicted | matched | Precision | Recall |  F1   | Support |
| :-----: | :-------: | :-----: | :-------: | :----: | :---: | :-----: |
| Surname |     x     |    x    |     x     |   x    |   x   |    x    |
|   All   |     x     |    x    |     x     |   x    |   x   |    x    |

##### test

|   tag   | predicted | matched | Precision | Recall |  F1   | Support |
| :-----: | :-------: | :-----: | :-------: | :----: | :---: | :-----: |
| Surname |     x     |    x    |     x     |   x    |   x   |    x    |
|   All   |     x     |    x    |     x     |   x    |   x   |    x    |

#### 5 worst prediction(s)

|   Image name   |  WER  | Alignment between ground truth - prediction |
| :------------: | :---: | :-----------------------------------------: |
| <image_id>.png |   x   |                      x                      |
|                |       |                      |                      |
|                |       |                      x                      |