Releases
0.5.0
Released on 17 July 2025 • View on Gitlab
Breaking changes
- 
The WorkerVersionMixinhas been removed. It was deprecated since version 0.3.7.
- 
The create_required_typeshelper has been removed. Workers should call the newcreate_element_typehelper (calling theCreateElementTypeendpoint) instead.
- 
The check_required_typeshelper has been updated:- 
developers should no longer unpacks the values given through the type_slugsargument.
- 
this function no longer returns a boolean value. 
 
- 
- 
This releases introduces compatibility with the 1.9.0 release of Arkindex 
Arkindex 1.9.0 related breaking changes
- 
Arkindex’s API Schema is now generated in JSON format instead of YAML. You should update your GitLab CI configuration file to match that change. 
- 
Many entity-related helpers were updated/removed: - 
create_entityandlist_corpus_entitieshave been removed.
- 
create_transcription_entityandcreate_transcription_entitieshave been updated.
- 
list_transcription_entitieshas a new response payload.
 
- 
- 
Two metadata-related helpers were updated: - 
create_metadatahas a different signature.
- 
create_metadata_bulkexpects a different format through themetadata_listargument.
 
- 
- 
The cache structure has changed and its version has been bumped to 4:- 
the CachedEntitytable has been removed.
- 
CachedTranscriptionEntity.entityattribute has been removed.
- 
CachedTranscriptionEntity.typehas been added, to store the type of the entity.
 
- 
Make sure to update the way your workers interact with these functions.
Misc
- 
The trim_polygonimage helper has been patched to prevent an odd behavior removing non-unique points from the provided polygon.
- 
Image download and upload helpers now wait longer before retrying their requests. This has been done to avoid spamming servers. 
- 
LLM workers should add multiline: trueto string type parameters like prompts. Refer to the documentation for more details.
- 
The resized_imageshelper has been reworked to:- 
allow users to set not only the maximum pixels for the long side but also for the short one, 
- 
calculate the bytes size and return the image encoded as a base64 string if needed, 
- 
resize using a longer list of image ratios. 
 
- 
If you are using this helper, you can find the detailed changes here and update your code consequently.
0.4.0
Released on 11 December 2024 • View on Gitlab
Breaking changes
- 
The Arkindex API Client library no longer depends on apistar. Some imports should be updated, most notably: # Old way from apistar.exceptions import ErrorResponse # New way from arkindex.exceptions import ErrorResponse
- 
The method BaseWorker.requesthas been removed. Developers should rely onBaseWorker.api_client.requestinstead.
Arkindex API
- 
The create_iiif_urlhelper has been added to create an image from an existing IIIF image by URL, using theCreateIIIFURLendpoint.
- 
The list_elementshelper has been added to list elements in the current project, using theListElementsendpoint.
- 
The create_element_childrenhelper has been added to link multiple elements to a parent element at once, using theCreateElementChildrenendpoint.
- 
The list_corpus_typeshelper has been added to list element types in the current project and store them as instance attribute.
- 
The download_exporthelper has been added to download a project SQLite export, using theDownloadExportendpoint.
- 
The download_latest_exporthelper has been added to download the latest SQLite export of a project.
- 
The list_process_elementshelper has been added to list the elements of the current process, using theListProcessElementsendpoint.
Processing
- 
ElementsWorkersupports processing dataset sets.
- 
ElementsWorkernow supportsExportprocesses, introduced in the latest Arkindex release.
- 
Most bulk endpoints now publish their results in batches, to avoid too large queries at once on Arkindex. The default batch size is 50but a larger value can be set through thebatch_sizeargument of the helper.
Worker template
- 
Workers now rely on the value set in the mandatory field docker.commandin the YAML configuration to know each worker’s command. TheCMDstatement in theDockerfileis no longer needed and should be removed.
Documentation
- 
The section Run your worker locally was updated. 
Misc
- 
Workers and arkindex-base-workernow support Python 3.12.
- 
A new pre-commit hook to report test files with too many lines is now added by default in new workers. 
- 
Pillow has an image size limit to avoid "decompression bombs". To still be able to process very large images, this limit can be increased through the ARKINDEX_MAX_IMAGE_PIXELSenvironment variable.
- 
Some tools have an image disk size limit instead of a dimensions limitations. When the image is too large, a new function resized_imagesis able to generate downsized versions of an image that can be used until the image is small enough in terms of disk size.
- 
A new helper is available to automatically pluralizesome words. This is mostly helpful in the log messages a worker might send. Default behaviour consist in adding an 's' at the end but some exceptions are supported like "entity" and "child".# Old way logger.info(f"Published {transcriptions_count} transcription{'s' if len(transcriptions_count) > 1 else ''}") # New way from arkindex.utils import pluralize logger.info(f"Published {transcriptions_count} {pluralize('transcription', transcriptions_count)}")
- 
The Teklia CA certificate is no longer needed in the Docker images of the worker. The Dockerfilecan be updated accordingly.WORKDIR /src - # Install curl - ENV DEBIAN_FRONTEND=non-interactive - RUN apt-get update -q -y && apt-get install -q -y --no-install-recommends curl # Install worker as a package ... - Add archi local CA - RUN curl https://assets.teklia.com/teklia_dev_ca.pem > /usr/local/share/ca-certificates/arkindex-dev.crt && update-ca-certificates - ENV REQUESTS_CA_BUNDLE /etc/ssl/certs/ca-certificates.crt
- 
The CLI arguments --elementand--elements-listwere converting the element IDs to different typesuuid.UUIDversusstr. They now both convert tostr.
0.3.7post1
Released on 23 May 2024 • View on Gitlab
0.3.7
Released on 16 April 2024 • View on Gitlab
Breaking changes
- 
This release updates the internal behavior of DatasetWorker, meant to process dataset sets, to accommodate for the changes introduced by Arkindex 1.6.0.
- 
The create_metadatashelper has been renamed tocreate_metadata_bulk. Make sure to update existing imports.
- 
The model version configuration and the user configuration are now updated at the very end of ElementsWorker.configureandDatasetWorker.configure. This means that there is no need to do it in workers.
# worker.py
class MyWorker(ElementsWorker):
    def configure(self):
-       # Retrieve the model configuration
-       if self.model_configuration:
-            self.config.update(self.model_configuration)
-
-       # Retrieve the user configuration
-       if self.user_configuration:
-           self.config.update(self.user_configuration)
        # Rest of configuration
        ...Project architecture
- 
The migration started in 0.3.6 is now finished and all project dependencies are now stored in pyproject.tomlfor botharkindex-base-workerand new workers, through the template.
Arkindex API
- 
The create_classificationshelper has been updated to use the right parameter of theCreateClassificationsendpoint. Missing ML classes are now created automatically, as increate_classification.
- 
The DatasetMixinhas been updated following changes to Arkindex’s dataset processes.
- 
The details of the loaded model is now always stored in the model_detailsattribute.
- 
The TrainingMixinexposes a new property,is_finetuning, to know if the worker has a model version set. This is helpful for training workers, to know if they are fine-tuning an existing model.
- 
Arkindex has deprecated the usage of worker_versionin many endpoints. This change has been reflected in affected endpoints. Support for the equivalentworker_runargument has been added where it was missing.
- 
The load_parentsparameter is now exposed on thelist_element_metadatahelper.
- 
There is an issue with the ValidateModelVersionendpoint in the latest Arkindex releases. This endpoint may return HTTP errors (codes 403 or 500) even though the model version has been successfully updated. To avoid raising false errors, a warning is logged when that happens and the worker’s processing will no longer stop at that exception.
Worker template
- 
The worker template has been updated: - 
default values for authorandemail,
- 
workers docker image have been renamed to make registry cleanup policies easier to write - 
tags are now named after the commit SHA: commit-$CI_COMMIT_SHORT_SHA(see Gitlab’s documentation to learn about this variable),
- 
and corresponding cleanup policy regex is commit-.*.
 
- 
- 
the typekey in YAML configurations has been removed.
 
- 
Documentation
- 
A new section explaining how to publish a worker to an Arkindex instance has been added. 
Misc
- 
A summary message is now logged at the end of the runmethod, even if no error was encountered during processing.
- 
A new helper was added to parse source arguments, mostly used forworker_versionandworker_runarguments. To filter manual sources, the Arkindex API expects theFalsevalue. This helper maps"manual"to this value.
- 
A new helper to upload a Pillow image has been added. 
- 
SSL verification is now skipped for Arkindex local development hosts. This only affects instance whose URL is matching the pattern *ark.localhost.
- 
A warning is now logged when calling an helper that doesn’t support cache. 
0.3.6
Released on 22 December 2023 • View on Gitlab
Breaking changes
- 
The arkindex_worker.gitmodule was removed. It was not used locally by any workers, this module was only used to expose some workflows from python-gitlab. Please refer to their documentation if your worker needs to communicate with a Git instance.
- 
Following Arkindex’s 1.5.3 release, the model_usageconfiguration parameter has been updated to a tri-enum. To migrate your workers:- 
model_usage: falsebecomesmodel_usage: disabled
- 
model_usage: truebecomesmodel_usage: required
 The supportedvalue means that the model is supported by a worker but not required to make it work.
- 
Project architecture
- 
PEP 621 encourages user to store most of the package’s metadata in the pyproject.toml. We followed this proposition both for thearkindex-workerpackage and the worker template.
Arkindex API
- 
The details of the model available to the worker is now stored under the model_detailsattribute.
- 
The list_corpus_entities API helper now stores the entities in the entitiesattribute instead of returning them.
- 
A reminder was added to prevent making changes to the Arkindex Cache schema without bumping the Version of said cache. 
- 
Each dataset’s archive is now properly deleted after processing. 
- 
The path to a Dataset's archive is now stored under the filepathproperty.
- 
The new create_element_parent API helper allows to create a link between two elements. 
- 
The create_sub_element was updated to support creating children element without zones and under a parent without a zone. 
- 
A new user configuration type was introduced to be able to select Arkindex Models. Learn more about it in the documentation.
Worker template
- 
When the provided slughad more than one word, it was invalid for either:- 
the package name, because the user used _as word delimiter,
- 
the module directory’s name, because the user used -as word delimiter.
 The package name and the module directory’s name are now both computed from the slug, making sure that: - 
the package name uses -as word delimiter,
- 
the module directory’s name uses _as word delimiter.
 
- 
Documentation
- 
A link to the documentation was added: - 
in the README, 
- 
as a GitLab badge on the repo. 
 
- 
- 
Some sections in the documentation were renamed to improve readability. 
0.3.5
Released on 8 November 2023 • View on Gitlab
Breaking changes
- 
The arkindex_worker.reportingmodule has been removed as the JSON report file was no longer needed.
- 
The --model-dirCLI argument was renamed to--extras-diras it was more suited to its use. This folder now stores dataset archives, hence the more generic name.
Arkindex API
- 
Following Arkindex 1.5.2 release, - 
new helpers for Task-related endpoints were introduced, 
- 
A new worker class is available, to support Datasetprocesses
- 
new helpers for Dataset-related endpoints were introduced, 
 
- 
- 
Added a unicity check on the input of the create_transcription_entitieshelper.
- 
The partial_update_elementhelper was updated to better match the endpoint.
Documentation
- 
Some modules were poorly displayed in the documentation. Class methods are now only listed under their class’s section. 
Release Management
- 
A Makefile was added to the worker template to deploy new releases more easily. The default branch expects master, make sure to change it to maindepending on your settings.
- 
The base image used in the worker’s docker image was changed from python:3.11topython:3.11-slim, in an effort to reduce their size.
Misc
- 
During the configuration stage, a summary of the worker is now logged instead of the revision’s hash. This was changed to support workers not linked to any revision on Arkindex. 
- 
A retry mechanism on HTTP 50x errors was added. Additionally, when the requested size exceeds the maximum size allowed by the IIIF server, a new try is done with maxinstead offullas size parameter. More information about these parameters in the IIIF documentation.
- 
When running the worker locally without the ARKINDEX_CORPUS_IDvariable set in the environment, an explicit exception will be raised when trying to access thecorpus_idattribute.
- 
This release adds support for Python 3.12. 
0.3.4
Released on 14 September 2023 • View on Gitlab
- 
The worker template was updated to correctly install Git submodules if it depends on any. 
- 
Base-worker now uses ruff for linting. This tool replaces isortandflake8.
- 
New Arkindex API helper to update an element, calling PartialUpdateElement. 
- 
New Arkindex API helper to list an element’s parents, calling ListElementParents. 
- 
Worker Activity API is now disabled when the worker runs in read-onlymode instead of relying on the--devCLI argument. The update_activity API helper was updated following Arkindex 1.5.1 changes.
- 
Worker can now resize the image of an element when opening them. This uses the IIIF resizing API. 
0.3.3
Released on 26 May 2023 • View on Gitlab
- 
The Timerclass previously defined inarkindex_worker.utilswas removed as it was already defined Teklia’s python toolbox.
# Old usage
from arkindex_worker.utils import Timer
# New usage
from teklia_toolbox.time import Timer- 
The create_element_transcriptionsAPI helper now accepts anelement_confidencefloat field in the dictionaries provided through thetranscriptionsfield. This confidence will be set on the created element.
- 
More query filters are available on the list_element_childrenAPI helper. More details about their usage is available in the documentation:- 
transcription_worker_version
- 
transcription_worker_run
- 
with_metadata
- 
worker_run
 
- 
- 
Arkindex Base-Workernow fully uses pathlib to handle filesystem paths as suggested by PEP 428.
- 
Many helpers were added to handle ZSTD and TAR archives as well as delete files cleanly. More details about that in the documentation of the arkindex_worker.utilsmodule.
- 
A bug affecting the parsing of the configuration of workers that use a Machine learning model stored on an Arkindex instance was fixed. 
0.3.2
Released on 8 March 2023 • View on Gitlab
- 
A helper to use the new API endpoint to create transcription entities more efficiently was implemented. 
- 
Training workers may now publish a model configuration when creating a new model version on Arkindex. This will make the execution of a generic worker much smoother. 
- 
The model version API endpoints were updated in the latest Arkindex release and a new helper was introduced subsequently. However, there are no breaking changes and the main helper, publish_model_version, still has the same signature and behaviour.
- 
The latest Arkindex release changed the way NER entities are stored and published. - 
The EntityTypeenum was removed as type slug are no longer restrcited to a small options,
- 
create_entitynow expects a type slug as a String,
- 
a new helper list_corpus_entity_typeswas added to load the Entity types in the corpus,
- 
a new helper check_required_entity_typesto make sure that needed entity types are available in the corpus was added. Missing ones are created by default (this can be disabled).
 
- 
- 
The create_classificationshelper now expects the UUID of each MLClass instead of their name.
- 
In developer mode, the only way to set the corpus_idattribute is to use theARKINDEX_CORPUS_IDenvironment variable. When it’s not set, all API requests using thecorpus_idas path parameter will fail with500status code. A warning log was added to help developers troubleshoot this error by advising them to set this variable.
- 
The create_transcriptionshelper no longer makes the API call in developer mode. This behaviour aligns with all other publication helpers.
- 
Fixes hash computation when publishing a model using publish_model_version.
- 
If a process is linked to a model version, its id will be available to the worker through its model_version_idattribute.
- 
The URLs of the API endpoint related to Ponos were changed in the latest Arkindex release. Some changes were needed in the test suite. 
- 
The classesattribute no directly contains the classes of the corpus of the processed element.
# Old usage
self.classes = {
    "corpus_id": {
        "ml_class_1": "class_uuid",
        ...
    }
}
# New usage
self.classes = {
    "ml_class_1": "class_uuid",
    ...
}0.3.1
Released on 8 November 2022 • View on Gitlab
- 
A breaking change, affecting mostly the API, was introduced in Arkindex’s 1.3.4 release: - 
Workers were mostly unaffected but the REST schema was updated. 
 
- 
- 
Workers will progressively not be able to publish results with a worker_version_idanymore on Arkindex. They will have to use a related but more general field,worker_run_id:- 
Most publication API endpoint helpers have been updated accordingly, 
- 
A new version of the cache was released with the updated Django models. 
 
- 
- 
Improvements to our Machine Learning training API to allow workers to use models published on Arkindex. 
- 
Support workers that have no configuration. 
- 
Allow publishing metadata with falsy but non-null values. 
- 
Add .polygonattribute shortcut onElement.
- 
Add a major test speedup on our worker template. 
- 
Support cache usage on our metadata API endpoint helpers. 
- 
Drop support for Python 3.6 and add support for Python 3.11. 
- 
Update arkindex-client to version 1.0.11. 
- 
Update shapely to version 1.8.5-post1 
0.3.0
Released on 12 September 2022 • View on Gitlab
- 
A large refactoring effort was made on the worker initialization, to streamline most of the workflow: - 
developer setup is now set in a dedicated method configure_for_developers
- 
cache setup is now set in a dedicated method configure_cache
- 
deprecated useless attribute features
- 
add a simpler debug mode for developers 
- 
depend only on Arkindex RetrieveWorkerRunAPI to get all the information needed, instead of relying on multiple API calls.
- 
remove ARKINDEX_CORPUS_IDenvironment variable usage, replaced by corpus information from API, except for developers
- 
do not erase defaults when reading configuration 
 
- 
- 
Support new Machine Learning training APIs on Arkindex to allow workers to create model versions and publish them as zstandard archives on a remote S3-compatible bucket. 
- 
Add API helpers - 
list_corpus_entities
- 
create_metadatas
- 
list_metadata
- 
list_transcription_entities
- 
create_required_types
- 
publish_model_version
- 
create_model_version
- 
upload_to_s3
 
- 
- 
Create missing element types when checking if they are available on the Arkindex instance (disabled by default). 
- 
Update arkindex-client to version 1.0.9. 
- 
Update automated rotation code ( revert_orientation) to support reverse application
0.2.4
Released on 6 July 2022 • View on Gitlab
- 
Document source code using Sphinx and docstrings with parameters. Documentation is available here. 
- 
Update workers inner configwith default values fromuser_configuration
- 
Support confidence in API helpers create_sub_elementandcreate_elementsas they are not available in Arkindex
- 
Port rotation code from tesseract worker 
- 
Add helper to trim polygons so that they fit inside their image 
0.2.3
Released on 28 March 2022 • View on Gitlab
- 
Update arkindex-client to version 1.0.8. 
- 
Replace all transcription scores with confidences (also renamed on Arkindex) 
- 
Support cache versioning and detect compatibility in workers 
- 
Support confidence in create_transcription_entityAPI helper
- 
Support Text orientation for transcriptions 
- 
Return the response payload in all creation helpers so that workers can use them 
- 
Support new metadata type URL
0.2.2
Released on 17 September 2021 • View on Gitlab
- 
Update arkindex-client to version 1.0.7. 
- 
Detect already processed elements using worker activity, and skip them 
- 
Support rotation, mirroring and fix image crop in open_imagemethod used by a lot of workers
- 
Change default value for user_configurationfromNoneto{}which simplifies usage code in workers
- 
Support new metadata type Numeric
- 
Add API helper create_classifications
- 
Set worker version in transcription entities API helpers 
0.2.1
Released on 30 June 2021 • View on Gitlab
- 
Add API helper check_required_types
- 
Add a developer mode via --devargument to simplify boot process for local development
- 
Send process_idwhen updating worker activities
- 
Remove nb_bestfrom ML classes list as it’s not supported anymore by Arkindex
0.2.0
Released on 6 May 2021 • View on Gitlab
This is a larger release which brings a new caching system to share data across workers (avoiding a lot of API calls in some workflows), and split the codebase in multiple files for helpers & unit tests (one file per topic).
- 
Add cache system using a local SQLite database, shared from workers to workers. Currently supports Arkindex models: - 
elements and their hierarchy, 
- 
transcriptions, 
- 
images, 
- 
classifications, 
- 
entities, 
 
- 
- 
Add API helpers: - 
create_elements
- 
create_transcriptions
- 
create_transcription_entity
 
- 
- 
Split ElementsWorker API helpers and unit tests in sub files 
- 
Drop TranscriptionType&DataSourceas they are not used anymore in Arkindex
- 
Retry all managed API calls that result in a 50x 
0.1.14
Released on 8 April 2021 • View on Gitlab
- 
Support weak SSL DH key when downloading images (needed for some outdated IIIF servers with old SSL certs). 
0.1.13
Released on 2 March 2021 • View on Gitlab
- 
Support new Arkindex feature Worker Activity, to track process progress. 
- 
Add new API helpers: - 
list_element_children
- 
list_transcriptions
- 
create_metadata
 
- 
- 
Extend git support with merge & rebase operations 
- 
Allow any worker type in cookiecutter template 
0.1.12
Released on 8 December 2020 • View on Gitlab
- 
Bugfix to avoid loading remote images from local file system 
- 
Deprecate TranscriptionType.
0.1.11
Released on 26 November 2020 • View on Gitlab
- 
Update arkindex-client to version 1.0.6. 
0.1.10
Released on 23 November 2020 • View on Gitlab
- 
Support git base operations to allow workers to clone and checkout repositories 
- 
Setup automated CI task to update Python dependencies 
- 
Update arkindex-client to version 1.0.5. 
0.1.9
Released on 19 October 2020 • View on Gitlab
- 
Update arkindex-client to version 1.0.4. 
- 
Add API helpers: - 
get_worker_version
- 
get_worker_version_slug
- 
get_ml_result_slug
 
- 
0.1.8
Released on 30 September 2020 • View on Gitlab
- 
Update arkindex-client to version 1.0.3. 
0.1.7
Released on 30 September 2020 • View on Gitlab
- 
Support Arkindex secrets for workers, using API but also local storage for developers. More information on Arkindex documentation. 
- 
Do not crash when a worker tries to create a classification that already exists. 
0.1.6
Released on 23 September 2020 • View on Gitlab
- 
Automatically create missing Arkindex ML classes when using get_ml_class_idand creating classifications through API helpers.
- 
Update arkindex-client to version 1.0.2. 
0.1.5
Released on 22 September 2020 • View on Gitlab
- 
Update arkindex-client to version 1.0.1. 
- 
Bugfix on score & confidence type checks in api helpers 
0.1.4
Released on 2 September 2020 • View on Gitlab
- 
Load worker configuration from Arkindex API, or local file (for developers) 
- 
Add API helpers: - 
load_corpus_classes
- 
get_ml_class_id
 
- 
0.1.3
Released on 25 August 2020 • View on Gitlab
- 
Add API helper create_element_transcriptions
- 
Return created instance ID in API helpers 
- 
Add cookiecutter variables to be able to easily rebuild 
0.1.2
Released on 19 August 2020 • View on Gitlab
- 
Use WORKER_VERSION_IDenvironment var in helper methods to identify the worker automatically
- 
Add API helpers: - 
create_transcription
- 
create_classification
- 
create_entity
 
- 
- 
Extend cookiecutter template to generate clean Python packages 
- 
Add the Timerhelper class in tools submodule
0.1.1
Released on 7 August 2020 • View on Gitlab
- 
Add API helper create_sub_element
- 
Add unit tests in cookiecutter template & base project. 
- 
Change cookiecutter base to use ElementsWorker 
0.1.0
Released on 21 July 2020 • View on Gitlab
Initial version of the base worker, with cookiecutter support to easily create workers using this project.