PyLaia Analyze
The command atr pylaia-analyze generates an error report in HTML or JSON format.
Parameters
| Parameter | Description | Type | Default |
|---|---|---|---|
|
Output format for the report ( |
|
|
|
Path to the prediction file. |
|
|
|
Path to the truth file. |
|
|
|
Path to the images. |
|
|
|
Image extension. |
|
|
|
A list of preprocessing functions to be applied on truth and predicted texts. Accepted preprocessing functions are: |
|
|
|
Whether the prediction file includes confidence scores. |
|
|
|
Whether to export all images. By default, only images with a Word Error Rate > 0 will be exported. |
|
|
Four preprocessing functions are available:
-
ignore_case: Lower the text before computing error rates, -
ignore_punct: Ignore the punctuation before computing error rates, -
ignore_numbers: Ignore all numbers before computing error rates, -
escape_punct: Consider punctuation characters as separate words.
Examples
Note that the generated report is sorted by decreasing Word Error Rate.
Create a JSON report
To generate a JSON report, run the following command:
atr pylaia-analyze --format json --predictions tests/examples/pred_test.txt --labels tests/examples/truth_test.txt --images path/to/images/ > report.json
The output will be saved to report.json:
{
"naf/0023182e-5b60-42d6-af90-1b665ccacf0d_033_6b66ef36-2c09-4838-b3b0-2622e75960a3": {
"src": "path/to/images/naf/0023182e-5b60-42d6-af90-1b665ccacf0d_033_6b66ef36-2c09-4838-b3b0-2622e75960a3.jpg",
"ref": "dem emellan uti 2:ne jämngoda delar förslagsvis klyfva,",
"hyp": "dom emellan uti 8ns jamn goda delar förslagfvis klyfve,",
"wer": {
"error_rate": 75.0,
"edit_distance": 6,
"length": 8
},
"cer": {
"error_rate": 14.58,
"edit_distance": 7,
"length": 48
}
},
...
}
Create a complete JSON report
To generate a complete JSON report, run the following command:
atr pylaia-analyze --format json --predictions tests/examples/pred_test.txt --labels tests/examples/truth_test.txt --images path/to/images/ --export-all > report_complete.json
The output will include all samples from the test set, including those with a perfect score, and will be saved to report_complete.json:
{
"naf/0023182e-5b60-42d6-af90-1b665ccacf0d_033_6b66ef36-2c09-4838-b3b0-2622e75960a3": {
"src": "path/to/images/naf/0023182e-5b60-42d6-af90-1b665ccacf0d_033_6b66ef36-2c09-4838-b3b0-2622e75960a3.jpg",
"ref": "dem emellan uti 2:ne jämngoda delar förslagsvis klyfva,",
"hyp": "dom emellan uti 8ns jamn goda delar förslagfvis klyfve,",
"wer": {
"error_rate": 75.0,
"edit_distance": 6,
"length": 8
},
"cer": {
"error_rate": 14.58,
"edit_distance": 7,
"length": 48
},
},
...
"naf/00586988-8313-43fb-a419-3d7ddd895c30_011_27833e16-099c-4f24-9267-2c492e51415c": {
"src": "path/to/images/naf/00586988-8313-43fb-a419-3d7ddd895c30_011_27833e16-099c-4f24-9267-2c492e51415c.jpg",
"ref": "med vilkor att, innan andra uppbudet kunde meddelas,",
"hyp": "med vilkor att, innan andra uppbudet kunde meddelas,",
"wer": {
"error_rate": 0.0,
"edit_distance": 0,
"length": 8
},
"cer": {
"error_rate": 0.0,
"edit_distance": 0,
"length": 45
}
}
}
Create a JSON report with confidence scores
To generate a JSON report with confidence scores, run the following command:
atr pylaia-analyze --format json --predictions tests/examples/pred_test_confidence.txt --labels tests/examples/truth_test.txt --images path/to/images/ --confidence-scores > report_confidence.json
The output will include confidence scores and will be saved to report_confidence.json:
{
"naf/0023182e-5b60-42d6-af90-1b665ccacf0d_033_6b66ef36-2c09-4838-b3b0-2622e75960a3": {
"src": "path/to/images/naf/0023182e-5b60-42d6-af90-1b665ccacf0d_033_6b66ef36-2c09-4838-b3b0-2622e75960a3.jpg",
"ref": "dem emellan uti 2:ne jämngoda delar förslagsvis klyfva,",
"hyp": "dom emellan uti 8ns jamn goda delar förslagfvis klyfve,",
"wer": {
"error_rate": 75.0,
"edit_distance": 6,
"length": 8
},
"cer": {
"error_rate": 14.58,
"edit_distance": 7,
"length": 48
},
"conf": "0.86"
},
...
}
Create a JSON report with preprocessed text
To generate a JSON report with text preprocessing/normalization (ignore case, ignore punctuation), run the following command:
atr pylaia-analyze --format json --predictions tests/examples/pred_test.txt --labels tests/examples/truth_test.txt --images path/to/images/ --preprocess ignore_case ignore_numbers > report_preprocessing.json
The output will be saved to report_preprocessing.json:
{
"naf/0023182e-5b60-42d6-af90-1b665ccacf0d_033_6b66ef36-2c09-4838-b3b0-2622e75960a3": {
"src": "path/to/images/naf/0023182e-5b60-42d6-af90-1b665ccacf0d_033_6b66ef36-2c09-4838-b3b0-2622e75960a3.jpg",
"ref": "dem emellan uti 2:ne jämngoda delar förslagsvis klyfva,",
"hyp": "dom emellan uti 8ns jamn goda delar förslagfvis klyfve,",
"wer": {
"error_rate": 75.0,
"edit_distance": 6,
"length": 8
},
"cer": {
"error_rate": 12.77,
"edit_distance": 6,
"length": 47
}
},
...
}
Create an HTML report
To generate an HTML report, run the following command:
atr pylaia-analyze --format html --predictions tests/examples/pred_test.txt --labels tests/examples/truth_test.txt --images path/to/images/ --image-ext .png > report.html
The output will be saved to report.html:
<!DOCTYPE html>
<html lang="en">
<head>
...
</head>
<body>
...
<img src="path/to/images/naf/0023182e-5b60-42d6-af90-1b665ccacf0d_033_6b66ef36-2c09-4838-b3b0-2622e75960a3.png" loading="lazy"></br>
<pre>naf/0023182e-5b60-42d6-af90-1b665ccacf0d_033_6b66ef36-2c09-4838-b3b0-2622e75960a3</pre><br/>
<pre>WER: <strong>75.0</strong></pre></br>
<pre>CER: <strong>14.58</strong></pre></br>
<pre>ref: dem emellan uti 2:ne jämngoda delar förslagsvis klyfva,</pre></br>
<pre>hyp: dom emellan uti 8ns jamn goda delar förslagfvis klyfve,</pre></br>
<br/>
<img src="path/to/images/naf/0023182e-5b60-42d6-af90-1b665ccacf0d_046_f44c0cc5-1e11-4b1e-915e-6566f60492fa.png" loading="lazy"></br>
<pre>naf/0023182e-5b60-42d6-af90-1b665ccacf0d_046_f44c0cc5-1e11-4b1e-915e-6566f60492fa</pre><br/>
<pre>WER: <strong>75.0</strong></pre></br>
<pre>CER: <strong>26.32</strong></pre></br>
<pre>ref: SS: 44: Oeconomie Mål.</pre></br>
<pre>hyp: SS: 44. Teconomie måle¬</pre></br>
<br/>
...
</body>
</html>
This report can be used to visualise and analyse results.
