Dataset extraction

Description

Use the teklia-layout-reader dataset analyze command to compute statistics from a dataset.

Parameter Description Type Default

--dataset-path

Path to a LayoutReader dataset.

pathlib.Path

--report-path

Path to save the Markdown report.

pathlib.Path

Examples

To analyze the dataset extracted in the previous section, use the following command:

teklia-layout-reader dataset analyze \
    finlam_dataset/ \
    --report-path finlam_dataset/report.md

This will create a Markdown report in finlam_dataset/report.md:

# Train

## Text statistics

| Metric | Words / zone | Zones / page | Page |
| :------| :----------: | :----------: | :--: |
| Min    |      0       |      7       |  1   |
| Max    |     897      |     524      |  1   |
| Mean   |     23.1     |    217.97    | 1.0  |
| Median |     13.0     |    216.0     | 1.0  |
| Total  |   3137181    |    135795    | 623  |

## Coordinates statistics

| Metric | Box width | Box height | Box surface (%) |
| :------| :-------: | :--------: | :-------------: |
| Min    |     1     |     1      |        0        |
| Max    |    930    |    869     |        41       |
| Mean   |   135.47  |   19.11    |       0.3       |
| Median |   128.0   |    12.0    |       0.17      |

# Val

...