Dataset analysis
Description
Use the teklia-dan dataset analyze
command to analyze a dataset. This will display statistics in Markdown format.
Parameter | Description | Type | Default |
---|---|---|---|
|
Path to the |
|
|
|
Path to the |
|
|
|
Where the summary will be saved. |
|
|
|
Keys and values to use to initialise your experiment on W&B. See the full list of available keys on the official documentation. |
|
Weights & Biases logging
To log your statistics file on Weights & Biases (W&B), you need to:
-
login to W&B via
wandb login
Resume run
To be sure that your statistics file is linked to your DAN training, we strongly recommend you to either reuse your wandb.init
parameter of your DAN training configuration or define these two keys:
-
id
with a unique ID that has never been used on your W&B project. We recommend you to generate a random 8-character word composed of letters and numbers using the Short Unique ID (UUID) Generating Library. -
resume
with the valueauto
.
The final configuration should look like:
{
"id": "<unique_ID>",
"resume": "auto"
}
Otherwise, W&B will create a new run when you’ll publish your statistics file.
Offline mode
If you do not have Internet access during the file generation, you can set the mode
key to offline
to use W&B’s offline mode. W&B will create a wandb
folder next to the --output-file
defined in the command.
The final configuration should look like:
{
"mode": "offline"
}
Once your statistics file is complete, you can publish your W&B run with the wandb sync
command and the --append
parameter:
wandb sync --project <wandb_project> --sync-all --append
As in online mode, we recommend you to set up a resume of your W&B runs (see the dedicated section).