Dataset extraction

Description

Use the teklia-layout-reader dataset analyze command to compute statistics on a dataset in LayoutReader format.

Parameter Description Type Default

--dataset-path

Path to a LayoutReader dataset.

pathlib.Path

--report-path

Path to save the Markdown report.

pathlib.Path

"dataset_report.md"

Examples

To analyze the dataset extracted in the previous section, use the following command:

teklia-layout-reader dataset analyze finlam_dataset/ --report-path report.md

This will create a Markdown report in report.md:

Statistics
==========

# Test.jsonl

## Classes statistics

| Metric | Class ID 4 | Class ID 5 | Class ID 6 |
| :----: | :---------: | :---------: | :--------: |
| Count  |    61664    |     865     |    1262    |

## Coordinates statistics

| Metric | Box width | Box height | Box surface (%) |
| :------| :-------: | :--------: | :-------------: |
| Min    |     1     |     1      |        0        |
| Max    |    932    |    871     |        79       |
| Mean   |   133.29  |   21.66    |       0.33      |
| Median |   128.0   |    14.0    |       0.2       |

## Separators statistics

| Metric | Box width | Box height | Box surface (%) |
| :------| :-------: | :--------: | :-------------: |
| Min    |    -129   |    -32     |        -1       |
| Max    |    932    |    721     |        1        |
| Mean   |    62.1   |   71.84    |       0.01      |
| Median |    2.0    |    55.0    |       0.01      |
...