Evaluation

Description

Use the teklia-dan evaluate command to evaluate a trained DAN model.

To evaluate DAN on your dataset:

Create a JSON configuration file. You can base the configuration file off the training one. Refer to the dedicated page for a description of parameters.
Run teklia-dan evaluate --config path/to/your/config.json.

This will, for each evaluated split:

Create a YAML file with the evaluation results in the results subfolder of the training.output_folder indicated in your configuration.
Print in the console a metrics Markdown table (see HTR example below).
Print in the console a Nerval metrics Markdown table, if the dataset.tokens parameter in your configuration is defined (see HTR and NER example below).
Print in the console the 5 worst predictions (see examples below).

The display of the worst predictions does not support batch evaluation. If the training.data.batch_size parameter is not equal to 1, then the WER displayed is the WER of the whole batch and not just the image.

Parameter Description Type Default

Parameter	Description	Type	Default
`--config`	Path to the configuration file.	`pathlib.Path`
`--nerval-threshold`	Distance threshold for the match between gold and predicted entity during Nerval evaluation. `0` would impose perfect matches, `1` would allow completely different strings to be considered as a match.	`float`	`0`
`--output-json`	Where to save evaluation results in JSON format.	`pathlib.Path`	`None`
`--sets`	Sets to evaluate. Defaults to `train`, `dev`, `test`.	`list[str]`	`["train", "dev", "test"]`

--config

Path to the configuration file.

pathlib.Path

--nerval-threshold

Distance threshold for the match between gold and predicted entity during Nerval evaluation. 0 would impose perfect matches, 1 would allow completely different strings to be considered as a match.

float

0

--output-json

Where to save evaluation results in JSON format.

pathlib.Path

None

--sets

Sets to evaluate. Defaults to train, dev, test.

list[str]

["train", "dev", "test"]

Examples

HTR evaluation

#### DAN evaluation

| Split | CER (HTR-NER) | CER (HTR) | WER (HTR-NER) | WER (HTR) | WER (HTR no punct) |
| :---: | :-----------: | :-------: | :-----------: | :-------: | :----------------: |
| train |       x       |     x     |       x       |     x     |         x          |
|  dev  |       x       |     x     |       x       |     x     |         x          |
| test  |       x       |     x     |       x       |     x     |         x          |

#### 5 worst prediction(s)

|   Image name   |  WER  | Alignment between ground truth - prediction |
| :------------: | :---: | :-----------------------------------------: |
| <image_id>.png |   x   |                      x                      |
|                |       |                      |                      |
|                |       |                      x                      |

HTR and NER evaluation

#### DAN evaluation

| Split | CER (HTR-NER) | CER (HTR) | WER (HTR-NER) | WER (HTR) | WER (HTR no punct) |  NER  |
| :---: | :-----------: | :-------: | :-----------: | :-------: | :----------------: | :---: |
| train |       x       |     x     |       x       |     x     |         x          |   x   |
|  dev  |       x       |     x     |       x       |     x     |         x          |   x   |
| test  |       x       |     x     |       x       |     x     |         x          |   x   |

#### Nerval evaluation

##### train

|   tag   | predicted | matched | Precision | Recall |  F1   | Support |
| :-----: | :-------: | :-----: | :-------: | :----: | :---: | :-----: |
| Surname |     x     |    x    |     x     |   x    |   x   |    x    |
|   All   |     x     |    x    |     x     |   x    |   x   |    x    |

##### dev

|   tag   | predicted | matched | Precision | Recall |  F1   | Support |
| :-----: | :-------: | :-----: | :-------: | :----: | :---: | :-----: |
| Surname |     x     |    x    |     x     |   x    |   x   |    x    |
|   All   |     x     |    x    |     x     |   x    |   x   |    x    |

##### test

|   tag   | predicted | matched | Precision | Recall |  F1   | Support |
| :-----: | :-------: | :-----: | :-------: | :----: | :---: | :-----: |
| Surname |     x     |    x    |     x     |   x    |   x   |    x    |
|   All   |     x     |    x    |     x     |   x    |   x   |    x    |

#### 5 worst prediction(s)

|   Image name   |  WER  | Alignment between ground truth - prediction |
| :------------: | :---: | :-----------------------------------------: |
| <image_id>.png |   x   |                      x                      |
|                |       |                      |                      |
|                |       |                      x                      |