Jaccard index
The newspaper-eval jaccard
command computes the Jaccard Error Rate, which estimates the count consistency of zone detection, based on surface coverage.
To know more about the options of this command, use newspaper-eval jaccard --help
.
Purpose
This command evaluates the alignment between predicted and ground truth articles and sections by performing the following steps:
-
For pages: filter predicted and reference zones, and compute the Jaccard Error Rate per zone label.
-
For articles and sections:
-
Match predicted and reference articles and sections
-
Compute an Intersection over Union (IoU) matrix between all predicted and reference articles/sections
-
Use the Hungarian matching algorithm to pair predicted and reference articles/sections with an IoU greater than 0.5
-
-
Compute the Jaccard Error Rate per zone at article/section level
-
Parameters
The list of parameters is detailed in this section.
Parameter | Description | Type | Default |
---|---|---|---|
|
Path to the directory containing JSON label files. |
|
|
|
Path to the directory containing JSON prediction files. |
|
|
|
Path to the configuration file with mapping classes. |
|
|
|
Minimum IoU threshold to use for matching. |
|
|
|
Whether to load files using the Journal format. |
|
|
|
Whether to evaluate metrics for each newspaper page. |
|
|
|
Path to a CSV file used to save the evaluation results. |
|
|
|
Whether to allow partial match between the files in |
|
|
Examples
Basic evaluation
Run the following command to compute metrics:
newspaper-eval jaccard --label-dir data/labels/ \
--prediction-dir data/predictions/ \
--config configs/finlam.yaml \
--from-journal
Will output:
INFO Loading labels...
INFO Loading prediction...
INFO The dataset is complete and valid.
INFO Evaluation:
| Level | Class | Jaccard Error Rate (%) | count predicted | count target |
| :-----: | :-------------: | :--------------------: | :-------------: | :----------: |
| page | HEADER-TITLE | 100.00 | 1 | 1 |
| page | HEADER-SUBTITLE | 100.00 | 0 | 0 |
| page | ARTICLE-TITLE | 0.00 | 0 | 2 |
| page | ARTICLE-TEXT | 80.00 | 5 | 4 |
| page | ILLUSTRATEDTEXT | 100.00 | 0 | 0 |
| page | ALL | 85.71 | 6 | 7 |
| article | ALL | 60.00 | 5 | 3 |
| section | ALL | 83.33 | 5 | 6 |
Evaluation per sample
To compute metrics for each page, use the --per-sample
option:
newspaper-eval jaccard --label-dir data/labels/ \
--prediction-dir data/predictions/ \
--config configs/finlam.yaml \
--from-journal \
--per-sample
Will output:
INFO Loading labels...
INFO Loading prediction...
INFO The dataset is complete and valid.
INFO Evaluation:
| Level | Class | Jaccard Error Rate (%) | count predicted | count target |
| :-----: | :-------------: | :--------------------: | :-------------: | :----------: |
| page | HEADER-TITLE | 100.00 | 1 | 1 |
| page | HEADER-SUBTITLE | 100.00 | 0 | 0 |
| page | ARTICLE-TITLE | 0.00 | 0 | 2 |
| page | ARTICLE-TEXT | 80.00 | 5 | 4 |
| page | ILLUSTRATEDTEXT | 100.00 | 0 | 0 |
| page | ALL | 85.71 | 6 | 7 |
| article | ALL | 60.00 | 5 | 3 |
| section | ALL | 83.33 | 5 | 6 |
INFO Per sample evaluation: count_jaccard_similarity.py:242
| Sample | Level | Jaccard Error Rate (%) | count predicted | count target |
| :----------: | :-----: | :--------------------: | :-------------: | :----------: |
| journal.json | page | 85.71 | 6 | 7 |
| journal.json | article | 60.00 | 5 | 3 |
| journal.json | section | 83.33 | 5 | 6 |
Evaluation and saving results to CSV
To save metrics in a CSV file, use the --save-csv-path
option:
newspaper-eval jaccard --label-dir data/labels/ \
--prediction-dir data/predictions/ \
--config configs/finlam.yaml \
--from-journal \
--save-csv-path metrics.csv
Will output:
INFO Loading labels...
INFO Loading prediction...
INFO The dataset is complete and valid.
INFO Evaluation:
| Level | Class | Jaccard Error Rate (%) | count predicted | count target |
| :-----: | :-------------: | :--------------------: | :-------------: | :----------: |
| page | HEADER-TITLE | 100.00 | 1 | 1 |
| page | HEADER-SUBTITLE | 100.00 | 0 | 0 |
| page | ARTICLE-TITLE | 0.00 | 0 | 2 |
| page | ARTICLE-TEXT | 80.00 | 5 | 4 |
| page | ILLUSTRATEDTEXT | 100.00 | 0 | 0 |
| page | ALL | 85.71 | 6 | 7 |
| article | ALL | 60.00 | 5 | 3 |
| section | ALL | 83.33 | 5 | 6 |
INFO Saving metrics to CSV: metrics.csv.
This will create a new metrics.csv
file:
Level,Class,Jaccard Error Rate (%),count predicted,count target
page,HEADER-TITLE,100.00,1,1
page,HEADER-SUBTITLE,100.00,0,0
page,ARTICLE-TITLE,0.00,0,2
page,ARTICLE-TEXT,80.00,5,4
page,ILLUSTRATEDTEXT,100.00,0,0
page,ALL,85.71,6,7
article,ALL,60.00,5,3
section,ALL,83.33,5,6