Development
DAN uses different tools during its development.
Linter
Code syntax is analyzed before submitting the code.
To run the linter tools suite you may use pre-commit.
pip install pre-commit
pre-commit run -a
Tests
Unit tests
pip install tox
tox
To recreate tox virtual environment (e.g. a dependencies update), you may run tox -r
.
Run a single test module: tox -- <test_path>
Run a single test: tox -- <test_path>::<test_function>
The tests use a large file stored via Git-LFS. Make sure to run git-lfs pull
before running them.
Commands
As unit tests do not test everything, it is sometimes necessary to use DAN commands directly to test developments.
Dataset tokens command
The library already has all the documents needed to run the dataset tokens command on a minimalist dataset. In the tests/data
directory, you can run the following command and add any extra parameters you need:
teklia-dan dataset tokens entities.yml
Dataset download command
The library already has all the documents needed to run the dataset download command on a minimalist dataset. In the tests/data/extraction
directory, you can run the following command and add any extra parameters you need:
teklia-dan dataset download --output .
Dataset language-model command
The library already has all the documents needed to run the dataset language-model command on a minimalist dataset. In the tests/data/prediction
directory, you can run the following command and add any extra parameters you need:
teklia-dan dataset language-model --output . --subword-vocab-size 45
Dataset analyze command
The library already has all the documents needed to run the dataset analyze command on a minimalist dataset. In the tests/data/training/training_dataset
directory, you can run the following command and add any extra parameters you need:
teklia-dan dataset analyze --labels labels.json --output-file analyze.md
Training command
The library already has all the documents needed to run the training command on a minimalist dataset. You can use the configuration available at configs/tests.json
. It is already populated with the parameters used in the unit tests.
teklia-dan train --config configs/tests.json
Evaluation command
The library already has all the documents needed to run the evaluation command on a minimalist dataset. You can use the configuration available at configs/eval.json
. It is already populated with the parameters used in the unit tests.
teklia-dan evaluate --config configs/eval.json
Predict command
The library already has all the documents needed to run the predict command with a minimalist model. In the tests/data/prediction
directory, you can run the following command and add any extra parameters you need:
teklia-dan predict \
--image-dir images/ \
--image-extension png \
--model . \
--output /tmp/dan-predict
Convert command
If you want to evaluate a NER models with you own scripts, you can convert DAN’s predictions in BIO format, using the convert command.
teklia-dan convert /tmp/dan-predict --tokens tokens.yml --output /tmp/dan-convert
Documentation
This documentation is written in AsciiDoc and generated by Antora.
Setup
Install the needed dependencies through:
npm install
Build the documentation using make antora
. You can then write in AsciiDoc in the relevant docs/*.adoc
files, and see output on file:///path/to/the/repo/public/index.html
.