Dataset extraction
Description
Use the teklia-layout-reader dataset analyze command to compute statistics from a dataset.
| Parameter | Description | Type | Default |
|---|---|---|---|
|
Path to a LayoutReader dataset. |
|
|
|
Path to save the Markdown report. |
|
Examples
To analyze the dataset extracted in the previous section, use the following command:
teklia-layout-reader dataset analyze \
finlam_dataset/ \
--report-path finlam_dataset/report.md
This will create a Markdown report in finlam_dataset/report.md:
# Train
## Text statistics
| Metric | Words / zone | Zones / page | Page |
| :------| :----------: | :----------: | :--: |
| Min | 0 | 7 | 1 |
| Max | 897 | 524 | 1 |
| Mean | 23.1 | 217.97 | 1.0 |
| Median | 13.0 | 216.0 | 1.0 |
| Total | 3137181 | 135795 | 623 |
## Coordinates statistics
| Metric | Box width | Box height | Box surface (%) |
| :------| :-------: | :--------: | :-------------: |
| Min | 1 | 1 | 0 |
| Max | 930 | 869 | 41 |
| Mean | 135.47 | 19.11 | 0.3 |
| Median | 128.0 | 12.0 | 0.17 |
# Val
...