Evaluation
Use the teklia-qwen evaluate command to evaluate QWEN’s predictions.
| Parameter | Description | Type | Default |
|---|---|---|---|
|
Path to the JSONL label file. |
|
|
|
Path to the JSON prediction file. |
|
|
|
Whether to compute Nerval scores and entity-based error rates. |
|
|
|
Show the n worst samples. |
|
|
|
Nerval threshold (entities will be matched if their CER is < threshold). |
|
|
Examples
Evaluate a model
By default, the evaluation calculates CER, WER, and a Format Score, which reflects the percentage of correctly formatted predictions.
-
Command to use:
teklia-qwen evaluate --labels test.jsonl --predictions predict_test.json -
Output:
2025-09-11 17:53:06,581 INFO/qwen.evaluate: Summary: | Set | CER (%) | WER (%) | Format Score (%) | N samples | |:----:|:-------:|:-------:|:----------------:|:---------:| | test | 9.16 | 25.85 | 100.0 | 81 |
Evaluate and show n worst images
-
Command to use:
teklia-qwen evaluate --labels test.jsonl \ --predictions predict_test.json \ --show-n-worst 3 -
Output:
2025-09-15 13:54:45,999 INFO/qwen.evaluate: Summary: | Set | CER (%) | WER (%) | Format Score (%) | N samples | |:----:|:-------:|:-------:|:----------------:|:---------:| | test | 9.16 | 25.85 | 100.0 | 81 | 2025-09-15 13:54:46,000 INFO/qwen.evaluate: Worst samples: | Image ID | CER (%) | WER (%) | Format | Label | Prediction | |:------------------------------------:|:-------:|:-------:|:------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| | 434bf324-07a9-4bb7-9b63-9d1af2ada29c | 61.77 | 78.46 | True | [{"nom": "Pauvrau"}, {"nom": "Paul\u00e9"}, {"nom": "Lanthivine"}, {"nom": "Guignon"}, {"nom": "Bosc"}, {"nom": "Mant\u00e9"}, {"nom": "Gueirard"}, {"nom": "Pr\u00e9bois"}, {"nom": "Dard"}, {"nom": "Nevi\u00e8re"}, {"nom": "Jez\u00e9quel"}, {"nom": "Barrol"}, {"nom": "Barth\u00e9lemy"}] | [{"nom": "Amorau"}, {"nom": "Paul\u00e9"}, {"nom": "Lanthivone"}, {"nom": "Guiguer"}, {"nom": "Bose"}, {"nom": "Mante"}, {"nom": "Gucirard"}, {"nom": "Soci\u00e9t\u00e9 des gran\u00e8s bassins"}, {"nom": "Idem"}, {"nom": "Idem"}, {"nom": "Idem"}, {"nom": "Idem"}, {"nom": "Idem"}, {"nom": "Idem"}, {"nom": "Idem"}, {"nom": "Idem"}, {"nom": "Idem"}, {"nom": "Idem"}, {"nom": "Idem"}, {"nom": "Idem"}, {"nom": "Idem"}, {"nom": "Idem"}, {"nom": "Idem"}, {"nom": "Idem"}, {"nom": "Idem"}, {"nom": "Idem"}, {"nom": "Idem"}, {"nom": "Idem"}, {"nom": "Idem"}, {"nom": "Idem"}, {"nom": "Idem"}] | | 0457d388-57c1-4fa8-9cfd-1b684c1e6ea9 | 25.31 | 44.34 | True | [{"nom": "Cadi\u00e9re"}, {"nom": "Mourou P\u00e9cunia"}, {"nom": "Jouveucel", "prenom": "Marcel Curet"}, {"nom": "Lercari"}, {"nom": "Bianco Giraud"}, {"nom": "Roux"}, {"nom": "Matharon"}, {"nom": "Audrieu", "prenom": "Ou\u00e9te"}, {"prenom": "Pigoul"}, {"nom": "Barseus Chardonne"}, {"nom": "R\u00e9cugis", "prenom": "Louis"}, {"nom": "Paul", "prenom": "Rapharet"}, {"nom": "B\u00e9renger"}, {"nom": "Fournier"}, {"nom": "Flotte"}, {"nom": "Rivi\u00e9re"}, {"nom": "Boeuf", "prenom": "Christin"}, {"nom": "Sicard", "prenom": "Adolphe des Cables"}, {"nom": "Herautte"}, {"nom": "Chardonne"}, {"nom": "Jaune", "prenom": "Matharon"}, {"nom": "Piguol"}, {"nom": "Bianco"}, {"nom": "Toche"}, {"nom": "B\u00e9levy", "prenom": "Bernard"}, {"nom": "David"}, {"nom": "Coste"}, {"nom": "Crastain"}, {"nom": "Laurent"}, {"nom": "Maurras"}, {"nom": "Orichioni", "prenom": "Carrega"}, {"nom": "Berre"}, {"nom": "Bouguier"}, {"nom": "Ou\u00e9te"}, {"nom": "Fortun\u00e9"}, {"nom": "Long"}, {"nom": "Touati Roux"}, {"nom": "Tric Tric"}, {"nom": "Cay"}, {"nom": "Giraud"}] | [{"nom": "Cadi\u00e8re"}, {"nom": "Mourou"}, {"nom": "Jouvencel"}, {"nom": "Lercari"}, {"nom": "Bianco"}, {"nom": "Roux"}, {"nom": "Matharon"}, {"nom": "Audrieu"}, {"nom": "Piguel"}, {"nom": "Barreus"}, {"nom": "R\u00e9cugis"}, {"nom": "Paul"}, {"nom": "B\u00e9renger"}, {"nom": "Fournier"}, {"nom": "Flotte"}, {"nom": "Nivi\u00e8re"}, {"nom": "Boeuf"}, {"nom": "Sicard"}, {"nom": "Herautte"}, {"nom": "Chardonne"}, {"nom": "Jaume"}, {"nom": "Piguel"}, {"nom": "Bianco"}, {"nom": "Toche"}, {"nom": "S\u00e9levy"}, {"nom": "David"}, {"nom": "Coste"}, {"nom": "Crastain"}, {"nom": "Laurent"}, {"nom": "Maurras"}, {"nom": "Oriolioni"}, {"nom": "Berre"}, {"nom": "Rousquier"}, {"nom": "On\u00e9ti"}, {"nom": "Fortun\u00e9"}, {"nom": "Long"}, {"nom": "Touati"}, {"nom": "Tric"}, {"nom": "Cay"}, {"nom": "Giraud"}] | | aea27708-2ab4-4643-b309-cbd619f5b466 | 20.84 | 34.09 | True | [{"nom": "Garnier"}, {"nom": "Girardello"}, {"nom": "Barbieri"}, {"nom": "Giraud"}, {"nom": "Renoux"}, {"nom": "Nanasque"}, {"nom": "Pons"}, {"nom": "Costa"}, {"nom": "Costa"}, {"nom": "Cessin"}, {"nom": "Soulari"}, {"nom": "Mascilesie"}, {"nom": "Figoli"}, {"nom": "Molfino"}, {"nom": "Garnier"}, {"nom": "Garello"}, {"nom": "Ginouv\u00e9s"}, {"nom": "Lamonico"}] | [{"nom": "Garnier"}, {"nom": "Girandello"}, {"nom": "Barbieri"}, {"nom": "Giraud", "prenom": "Jouglas"}, {"nom": "Renoux"}, {"nom": "Nanasque"}, {"nom": "Point"}, {"nom": "Costa"}, {"nom": "Costa"}, {"nom": "Cessin", "prenom": "C\u00e9saire"}, {"nom": "Soulari"}, {"nom": "Masalesi", "prenom": "Delague"}, {"nom": "Figoli"}, {"nom": "Molfino"}, {"nom": "Garnier"}, {"nom": "Garello"}, {"nom": "Genouv\u00e9s", "prenom": "Delague"}, {"nom": "Lamonica"}] |
Evaluate a model with Nerval
To calculate Nerval metrics and error rates by categories, use the --entities option.
-
Command to use:
teklia-qwen evaluate --labels test.jsonl \ --predictions predict_test.json \ --entities -
Output:
2025-09-15 13:48:24,048 INFO/qwen.evaluate: Summary: | Set | CER (%) | WER (%) | Format Score (%) | N samples | |:----:|:-------:|:-------:|:----------------:|:---------:| | test | 9.16 | 25.85 | 100.0 | 81 | 2025-09-15 13:51:45,698 INFO/bio_parser.utils: Loading labels... 2025-09-15 13:51:45,705 INFO/bio_parser.utils: Loading prediction... 2025-09-15 13:51:45,711 INFO/bio_parser.utils: The dataset is complete and valid. 2025-09-15 13:51:45,813 INFO/qwen.evaluate: Nerval evaluation: | Category | Precision (%) | Recall (%) | F1 (%) | Support | |:---------|:-------------:|:----------:|:------:|:-------:| | nom | 56.17 | 56.93 | 56.55 | 1990 | | prenom | 40.87 | 40.55 | 40.71 | 254 | | total | 54.47 | 55.08 | 54.78 | 2244 | 2025-09-15 13:51:45,814 INFO/bio_parser.utils: Loading labels... 2025-09-15 13:51:45,820 INFO/bio_parser.utils: Loading prediction... 2025-09-15 13:51:45,825 INFO/bio_parser.utils: The dataset is complete and valid. 2025-09-15 13:51:45,849 INFO/qwen.evaluate: CER/WER evaluation: | Category | ECER (%) | EWER (%) | Support | |:---------|:--------:|:--------:|--------:| | nom | 13.38 | 43.82 | 76 | | prenom | 31.83 | 52.99 | 22 | | total | 15.83 | 45.20 | 81 |
Evaluate a model with custom Nerval threshold
Nerval aligns label and predicted entities by calculating the Character Error Rate (CER) between them. The default threshold is 0 (strict).
To use a different threshold, set --nerval-threshold.
-
Command to use:
teklia-qwen evaluate --labels test.jsonl \ --predictions predict_test.json \ --entities \ --nerval-threshold 0.3 -
Output:
2025-09-15 13:52:42,407 INFO/qwen.evaluate: Summary: | Set | CER (%) | WER (%) | Format Score (%) | N samples | |:----:|:-------:|:-------:|:----------------:|:---------:| | test | 9.16 | 25.85 | 100.0 | 81 | 2025-09-15 13:52:42,408 INFO/bio_parser.utils: Loading labels... 2025-09-15 13:52:42,415 INFO/bio_parser.utils: Loading prediction... 2025-09-15 13:52:42,422 INFO/bio_parser.utils: The dataset is complete and valid. 2025-09-15 13:52:42,526 INFO/qwen.evaluate: Nerval evaluation: | Category | Precision (%) | Recall (%) | F1 (%) | Support | |:---------|:-------------:|:----------:|:------:|:-------:| | nom | 80.71 | 81.81 | 81.26 | 1990 | | prenom | 59.52 | 59.06 | 59.29 | 254 | | total | 78.36 | 79.23 | 78.79 | 2244 | 2025-09-15 13:52:42,527 INFO/bio_parser.utils: Loading labels... 2025-09-15 13:52:42,533 INFO/bio_parser.utils: Loading prediction... 2025-09-15 13:52:42,539 INFO/bio_parser.utils: The dataset is complete and valid. 2025-09-15 13:52:42,562 INFO/qwen.evaluate: CER/WER evaluation: | Category | ECER (%) | EWER (%) | Support | |:---------|:--------:|:--------:|--------:| | nom | 13.38 | 43.82 | 76 | | prenom | 31.83 | 52.99 | 22 | | total | 15.83 | 45.20 | 81 |