Dataset analysis
Description
Use the teklia-dan dataset analyze command to analyze a dataset. This will display statistics in Markdown format.
| Parameter | Description | Type | Default |
|---|---|---|---|
|
Path to the |
|
|
|
Path to the |
|
|
|
Where the summary will be saved. |
|
|
|
Keys and values to use to initialise your experiment on W&B. See the full list of available keys on the official documentation. |
|
Weights & Biases logging
To log your statistics file on Weights & Biases (W&B), you need to:
-
login to W&B via
wandb login
Resume run
To be sure that your statistics file is linked to your DAN training, we strongly recommend you to either reuse your wandb.init parameter of your DAN training configuration or define these two keys:
-
idwith a unique ID that has never been used on your W&B project. We recommend you to generate a random 8-character word composed of letters and numbers using the Short Unique ID (UUID) Generating Library. -
resumewith the valueauto.
The final configuration should look like:
{
"id": "<unique_ID>",
"resume": "auto"
}
Otherwise, W&B will create a new run when you’ll publish your statistics file.
Offline mode
If you do not have Internet access during the file generation, you can set the mode key to offline to use W&B’s offline mode. W&B will create a wandb folder next to the --output-file defined in the command.
The final configuration should look like:
{
"mode": "offline"
}
Once your statistics file is complete, you can publish your W&B run with the wandb sync command and the --append parameter:
wandb sync --project <wandb_project> --sync-all --append
As in online mode, we recommend you to set up a resume of your W&B runs (see the dedicated section).