Article and section separation

The newspaper-eval article command provides a set of metrics to evaluate the quality of article and section detection, based on surface coverage.

To know more about the options of this command, use newspaper-eval article --help.

Purpose

This command evaluates the alignment between predicted and ground truth articles and sections by performing the following steps:

  1. Matching process for sections and articles:

    • Compute an Intersection over Union (IoU) matrix between all predicted and reference zones

    • Use the Hungarian matching algorithm to pair predicted and ground truth articles with an IoU greater than 0.5

  2. Metric computation:

    • Compute precision, recall, and F1 score based on the matched pairs.

    • Compute the mean IoU across all matched predictions.

Parameters

The list of parameters is detailed in this section.

Parameter Description Type Default

--label-dir

Path to the directory containing JSON label files.

Path

--prediction-dir

Path to the directory containing JSON prediction files.

Path

--config

Path to the configuration file with mapping classes.

Path

--iou-threshold

Minimum IoU threshold to use for matching.

Path

None

--from-journal

Whether to load files using the Journal format.

bool

False

--per-sample

Whether to evaluate metrics for each newspaper page.

bool

False

--save-csv-path

Path to a CSV file used to save the evaluation results.

Path

None

--allow-partial

Whether to allow partial match between the files in labels-dir and prediction-dir.

bool

False

Examples

Basic evaluation

Run the following command to compute metrics:

newspaper-eval article  --label-dir data/labels/ \
                        --prediction-dir data/predictions/ \
                        --config configs/finlam.yaml \
                        --from-journal

Will output:

INFO     Loading labels...
INFO     Loading prediction...
INFO     The dataset is complete and valid.
INFO     Evaluation:
|  Level  | Precision (%) | Recall (%) | F1 (%) | mIOU (%) | count predicted | count target |
| :-----: | :-----------: | :--------: | :----: | :------: | :-------------: | :----------: |
| article |     62.07     |   64.29    | 63.16  |  80.90   |        58       |      56      |
| section |     73.08     |   67.86    | 70.37  |  80.36   |        26       |      28      |

Evaluation per sample

To compute metrics for each page, use the --per-sample option:

newspaper-eval article  --label-dir data/labels/ \
                        --prediction-dir data/predictions/ \
                        --config configs/finlam.yaml \
                        --from-journal \
                        --per-sample

Will output:

INFO     Loading labels...
INFO     Loading prediction...
INFO     The dataset is complete and valid.
INFO     Evaluation:
|  Level  | Precision (%) | Recall (%) | F1 (%) | mIOU (%) | count predicted | count target |
| :-----: | :-----------: | :--------: | :----: | :------: | :-------------: | :----------: |
| article |     62.07     |   64.29    | 63.16  |  80.90   |        58       |      56      |
| section |     73.08     |   67.86    | 70.37  |  80.36   |        26       |      28      |
INFO     Per sample evaluation:
|     Sample     |  Level  | Precision (%) | Recall (%) | F1 (%) | mIOU (%) | count predicted | count target |
| :------------: | :-----: | :-----------: | :--------: | :----: | :------: | :-------------: | :----------: |
| 4100130_1.json | article |     93.33     |   82.35    | 87.50  |  80.29   |        15       |      17      |
| 4100130_2.json | article |     63.64     |   53.85    | 58.33  |  81.67   |        11       |      13      |
| 4100130_3.json | article |     46.88     |   57.69    | 51.72  |  81.12   |        32       |      26      |
| 4100130_1.json | section |     93.33     |   82.35    | 87.50  |  80.29   |        15       |      17      |
| 4100130_2.json | section |     50.00     |   44.44    | 47.06  |  82.91   |        8        |      9       |
| 4100130_3.json | section |     33.33     |   50.00    | 40.00  |  71.13   |        3        |      2       |

Evaluation and saving results to CSV

To save metrics in a CSV file, use the --save-csv-path option:

newspaper-eval article  --label-dir data/labels/ \
                        --prediction-dir data/predictions/ \
                        --config configs/finlam.yaml \
                        --from-journal \
                        --save-csv-path metrics.csv

Will output:

INFO     Loading labels...
INFO     Loading prediction...
INFO     The dataset is complete and valid.
INFO     Evaluation:
|  Level  | Precision (%) | Recall (%) | F1 (%) | mIOU (%) | count predicted | count target |
| :-----: | :-----------: | :--------: | :----: | :------: | :-------------: | :----------: |
| article |     62.07     |   64.29    | 63.16  |  80.90   |        58       |      56      |
| section |     73.08     |   67.86    | 70.37  |  80.36   |        26       |      28      |
INFO     Saving metrics to CSV: metrics.csv.

This will create a new metrics.csv file:

Level,Precision (%),Recall (%),F1 (%),mIOU (%),count predicted,count target
article,62.07,64.29,63.16,80.90,58,56
section,73.08,67.86,70.37,80.36,26,28