Bag-of-Entity metric
The ie-eval boe command can be used to compute the bag-of-entity recognition and error rates, globally and for each semantic category.
Metric description
Recognition rate (Precision, Recall, F1)
The Bag-of-Entities (BoE) recognition rate checks whether predicted entities appear in the ground truth and if ground truth entities appear in the prediction, regardless of their position.
-
The number of True Positives (TP) is the number of entities that appears both in the label and the prediction.
-
The number of False Positives (FP) is the number of entities that appear in the prediction, but not in the label.
-
The number of False Negatives (FN) is the number of entities that appear in the label, but not in the prediction.
From these counts, the Precision, Recall and F1-scores can be computed:
-
The Precision (P) is the fraction of predicted entities that also appear in the ground truth. It is defined by $\frac{TP}{TP + FP}$.
-
The Recall ® is the fraction of ground truth entities that are predicted by the automatic model. It is defined by $\frac{TP}{TP + FN}$.
-
The F1-score is the harmonic mean of the Precision and Recall. It is defined by $\frac{2 \times P \times R}{P + R}$.
Error rate (bWER)
The Bag-of-Entity (BoE) error rate is derived from the bag of words WER (bWER) metric proposed by Vidal et al. in End-to-End page-Level assessment of handwritten text recognition. Entities are defined as a combination of a text and its semantic tag. For example:
-
Label:
[("person", "Georges Washington"), ("date", "the last day of 1798"), ("date", "January 24th")] -
Prediction:
[("person", "Georges Woshington"), ("date", "the last day of 1798")
From ground truth and predicted entities, we count the number of errors and compute the error rate.
-
The number of insertions & deletions ($N_{ID}$) is the absolute difference between the number of ground truth entities and predicted entities. In this case,
("date", "January 24th")counts as a deletion, so $N_{ID} = 1$. -
The number of substitutions ($N_S$) is defined as $(N_{SID} - N_{ID}) / 2$, where $N_{SID}$ is the total number of errors. In this case,
("person", "Georges Woshington")counts as a substitution, so $N_S = 1$. -
The error rate ($BoE_{WER}$) is then defined as $(N_{ID} + N_S)/|G|$, where $|G|$ is the number of ground truth words. In this example, $BoE_{WER} = 2 / 3 = 0.67$.
Parameters
Here are the available parameters for this metric:
| Parameter | Description | Type | Default |
|---|---|---|---|
|
Path to the directory containing BIO label files. |
|
|
|
Path to the directory containing BIO prediction files. |
|
|
|
Whether to display the metric for each category. |
|
|
The parameters are also described when running ie-eval boe --help.
Examples
Global evaluation
Use the following command to compute the overall BoE metrics:
ie-eval boe --label-dir Simara/labels/ \
--prediction-dir Simara/predictions/
It will output the results in Markdown format:
2024-01-24 12:20:26,973 INFO/bio_parser.utils: Loading labels...
2024-01-24 12:20:27,104 INFO/bio_parser.utils: Loading prediction...
2024-01-24 12:20:27,187 INFO/bio_parser.utils: The dataset is complete and valid.
| Category | bWER (%) | Precision (%) | Recall (%) | F1 (%) | N words | N documents |
|:---------|:--------:|:-------------:|:----------:|:------:|:-------:|:-----------:|
| total | 23.23 | 77.06 | 77.34 | 77.20 | 4430 | 804 |
Evaluation for each category
Use the following command to compute the BoE metrics for each semantic category:
ie-eval boe --label-dir Simara/labels/ \
--prediction-dir Simara/predictions/ \
--by-category
It will output the results in Markdown format:
2024-01-24 12:20:48,096 INFO/bio_parser.utils: Loading labels...
2024-01-24 12:20:48,232 INFO/bio_parser.utils: Loading prediction...
2024-01-24 12:20:48,315 INFO/bio_parser.utils: The dataset is complete and valid.
| Category | bWER (%) | Precision (%) | Recall (%) | F1 (%) | N words | N documents |
|:--------------------|:--------:|:-------------:|:----------:|:------:|:-------:|:-----------:|
| total | 23.23 | 77.06 | 77.34 | 77.20 | 4430 | 804 |
| cote_article | 2.81 | 97.21 | 97.78 | 97.49 | 676 | 676 |
| cote_serie | 2.81 | 97.64 | 97.78 | 97.71 | 676 | 676 |
| precisions_sur_cote | 11.85 | 88.28 | 88.15 | 88.21 | 675 | 675 |
| intitule | 56.09 | 43.91 | 43.91 | 43.91 | 804 | 804 |
| date | 5.73 | 94.65 | 94.27 | 94.46 | 751 | 751 |
| analyse_compl | 50.45 | 50.85 | 50.71 | 50.78 | 771 | 771 |
| classement | 25.97 | 74.03 | 74.03 | 74.03 | 77 | 77 |