PyLaia Analyze

The command atr pylaia-analyze generates an error report in HTML or JSON format.

Parameters

Parameter Description Type Default

Parameter	Description	Type	Default
`--format`	Output format for the report (`"html"`, `"json"`)	`str`
`--predictions`	Path to the prediction file.	`Path`
`--labels`	Path to the truth file.	`Path`
`--images`	Path to the images.	`Path`
`--image-ext`	Image extension.	`str`	`.jpg`
`--preprocess`	A list of preprocessing functions to be applied on truth and predicted texts. Accepted preprocessing functions are: `ignore_case`, `ignore_punct`, `ignore_numbers`, `escape_punct`.	`List[str]`	`[]`
`--confidence-scores`	Whether the prediction file includes confidence scores.	`bool`	`False`
`--export-all`	Whether to export all images. By default, only images with a Word Error Rate > 0 will be exported.	`bool`	`False`

--format

Output format for the report ("html", "json")

str

--predictions

Path to the prediction file.

Path

--labels

Path to the truth file.

Path

--images

Path to the images.

Path

--image-ext

Image extension.

str

.jpg

--preprocess

A list of preprocessing functions to be applied on truth and predicted texts. Accepted preprocessing functions are: ignore_case, ignore_punct, ignore_numbers, escape_punct.

List[str]

[]

--confidence-scores

Whether the prediction file includes confidence scores.

bool

False

--export-all

Whether to export all images. By default, only images with a Word Error Rate > 0 will be exported.

bool

False

Four preprocessing functions are available:

ignore_case: Lower the text before computing error rates,
ignore_punct: Ignore the punctuation before computing error rates,
ignore_numbers: Ignore all numbers before computing error rates,
escape_punct: Consider punctuation characters as separate words.

Examples

Note that the generated report is sorted by decreasing Word Error Rate.

Create a JSON report

To generate a JSON report, run the following command:

atr pylaia-analyze --format json --predictions tests/examples/pred_test.txt --labels tests/examples/truth_test.txt --images path/to/images/ > report.json

The output will be saved to report.json:

{
    "naf/0023182e-5b60-42d6-af90-1b665ccacf0d_033_6b66ef36-2c09-4838-b3b0-2622e75960a3": {
        "src": "path/to/images/naf/0023182e-5b60-42d6-af90-1b665ccacf0d_033_6b66ef36-2c09-4838-b3b0-2622e75960a3.jpg",
        "ref": "dem emellan uti 2:ne jämngoda delar förslagsvis klyfva,",
        "hyp": "dom emellan uti 8ns jamn goda delar förslagfvis klyfve,",
        "wer": {
            "error_rate": 75.0,
            "edit_distance": 6,
            "length": 8
        },
        "cer": {
            "error_rate": 14.58,
            "edit_distance": 7,
            "length": 48
        }
    },
    ...
}

Create a complete JSON report

To generate a complete JSON report, run the following command:

atr pylaia-analyze --format json --predictions tests/examples/pred_test.txt --labels tests/examples/truth_test.txt --images path/to/images/ --export-all > report_complete.json

The output will include all samples from the test set, including those with a perfect score, and will be saved to report_complete.json:

{
    "naf/0023182e-5b60-42d6-af90-1b665ccacf0d_033_6b66ef36-2c09-4838-b3b0-2622e75960a3": {
        "src": "path/to/images/naf/0023182e-5b60-42d6-af90-1b665ccacf0d_033_6b66ef36-2c09-4838-b3b0-2622e75960a3.jpg",
        "ref": "dem emellan uti 2:ne jämngoda delar förslagsvis klyfva,",
        "hyp": "dom emellan uti 8ns jamn goda delar förslagfvis klyfve,",
        "wer": {
            "error_rate": 75.0,
            "edit_distance": 6,
            "length": 8
        },
        "cer": {
            "error_rate": 14.58,
            "edit_distance": 7,
            "length": 48
        },
    },
...
    "naf/00586988-8313-43fb-a419-3d7ddd895c30_011_27833e16-099c-4f24-9267-2c492e51415c": {
        "src": "path/to/images/naf/00586988-8313-43fb-a419-3d7ddd895c30_011_27833e16-099c-4f24-9267-2c492e51415c.jpg",
        "ref": "med vilkor att, innan andra uppbudet kunde meddelas,",
        "hyp": "med vilkor att, innan andra uppbudet kunde meddelas,",
        "wer": {
            "error_rate": 0.0,
            "edit_distance": 0,
            "length": 8
        },
        "cer": {
            "error_rate": 0.0,
            "edit_distance": 0,
            "length": 45
        }
    }
}

Create a JSON report with confidence scores

To generate a JSON report with confidence scores, run the following command:

atr pylaia-analyze --format json --predictions tests/examples/pred_test_confidence.txt --labels tests/examples/truth_test.txt --images path/to/images/ --confidence-scores > report_confidence.json

The output will include confidence scores and will be saved to report_confidence.json:

{
    "naf/0023182e-5b60-42d6-af90-1b665ccacf0d_033_6b66ef36-2c09-4838-b3b0-2622e75960a3": {
        "src": "path/to/images/naf/0023182e-5b60-42d6-af90-1b665ccacf0d_033_6b66ef36-2c09-4838-b3b0-2622e75960a3.jpg",
        "ref": "dem emellan uti 2:ne jämngoda delar förslagsvis klyfva,",
        "hyp": "dom emellan uti 8ns jamn goda delar förslagfvis klyfve,",
        "wer": {
            "error_rate": 75.0,
            "edit_distance": 6,
            "length": 8
        },
        "cer": {
            "error_rate": 14.58,
            "edit_distance": 7,
            "length": 48
        },
        "conf": "0.86"
    },
...
}

Create a JSON report with preprocessed text

To generate a JSON report with text preprocessing/normalization (ignore case, ignore punctuation), run the following command:

atr pylaia-analyze --format json --predictions tests/examples/pred_test.txt --labels tests/examples/truth_test.txt --images path/to/images/ --preprocess ignore_case ignore_numbers  > report_preprocessing.json

The output will be saved to report_preprocessing.json:

{
    "naf/0023182e-5b60-42d6-af90-1b665ccacf0d_033_6b66ef36-2c09-4838-b3b0-2622e75960a3": {
        "src": "path/to/images/naf/0023182e-5b60-42d6-af90-1b665ccacf0d_033_6b66ef36-2c09-4838-b3b0-2622e75960a3.jpg",
        "ref": "dem emellan uti 2:ne jämngoda delar förslagsvis klyfva,",
        "hyp": "dom emellan uti 8ns jamn goda delar förslagfvis klyfve,",
        "wer": {
            "error_rate": 75.0,
            "edit_distance": 6,
            "length": 8
        },
        "cer": {
            "error_rate": 12.77,
            "edit_distance": 6,
            "length": 47
        }
    },
    ...
}

Create an HTML report

To generate an HTML report, run the following command:

atr pylaia-analyze --format html --predictions tests/examples/pred_test.txt --labels tests/examples/truth_test.txt --images path/to/images/ --image-ext .png > report.html

The output will be saved to report.html:

<!DOCTYPE html>
<html lang="en">
    <head>
    ...
    </head>
    <body>
    ...
        <img src="path/to/images/naf/0023182e-5b60-42d6-af90-1b665ccacf0d_033_6b66ef36-2c09-4838-b3b0-2622e75960a3.png" loading="lazy"></br>
        <pre>naf/0023182e-5b60-42d6-af90-1b665ccacf0d_033_6b66ef36-2c09-4838-b3b0-2622e75960a3</pre><br/>
        <pre>WER: <strong>75.0</strong></pre></br>
        <pre>CER: <strong>14.58</strong></pre></br>
        <pre>ref: dem emellan uti 2:ne jämngoda delar förslagsvis klyfva,</pre></br>
        <pre>hyp: dom emellan uti 8ns jamn goda delar förslagfvis klyfve,</pre></br>
        <br/>
        <img src="path/to/images/naf/0023182e-5b60-42d6-af90-1b665ccacf0d_046_f44c0cc5-1e11-4b1e-915e-6566f60492fa.png" loading="lazy"></br>
        <pre>naf/0023182e-5b60-42d6-af90-1b665ccacf0d_046_f44c0cc5-1e11-4b1e-915e-6566f60492fa</pre><br/>
        <pre>WER: <strong>75.0</strong></pre></br>
        <pre>CER: <strong>26.32</strong></pre></br>
        <pre>ref: SS: 44: Oeconomie Mål.</pre></br>
        <pre>hyp: SS: 44. Teconomie måle¬</pre></br>
        <br/>
        ...
    </body>
</html>

This report can be used to visualise and analyse results. html report