Arkindex 1.12.1

We are happy to announce that a new Arkindex release is available. You can explore Arkindex and try out the newest features on our demo instance, demo.arkindex.org.

Dataset exports

Database exports can now be started for a single dataset at a time instead of a whole project. These exports are intended to be used in training processes, as a replacement of the caching system removed in the previous release.

Dataset exports modal

The work involved in introducing these new exports has also allowed us to switch to our Python export library to generate the export databases, ensuring that Arkindex always generates exports compatible with this library. Previously, some small inconsistencies in the database constraints or the order of columns could have appeared.

Processes

In this release, some improvements to the configuration step of a process have been made, to help with running processes at scale.

Switch worker versions

It is now possible to use a different worker version in a process without losing the selected model version, worker configuration or dependencies.

Button to change the worker version

Cloning a process

A new CloneProcess API endpoint has been introduced. From an existing inference or training process, it creates a new one that uses the same element filters or dataset sets, as well as the same worker versions, model versions and worker configurations, without having to use templates.

This feature will be added to in the web interface in the next release.

Worker types

Worker types can now have custom colors assigned to them. Previously, the Arkindex frontend used specific colors for worker types with specific slugs such as classifier, and used blue for any other unknown type.

Worker type colors can be updated by instance administrators using the administration interface.

This release also introduces new API endpoints to manage worker types, which can also be used to define colors. Worker type management will also be made available in the web interface in the next release.

Worker configurations

Workers can now define S3 bucket fields, allowing users to select an S3 bucket just like in S3 ingestion processes.

In the future, this will allow us to make S3 ingestion processes use the configuration of the related system worker directly, making it easier to extend the worker and provide new options.

Additionally, the default value on a non-editable field can now be set to null. As model versions can now only override fields that have been defined in a worker, this allows workers to define non-editable fields intended to be set by model versions without having to specify a default value.

Misc

  • A default name is now set when starting a training process, as is already done for inference processes or for imports.

  • In Enterprise Edition, errors that occur during a Docker pull are now cleanly reported in the task logs, making it easier to troubleshoot a task marked as a system error.

  • Fixed an issue that caused spurious warnings about navigation filters when deleting elements, deleting worker results, or populating datasets from the web interface.

API

The following API endpoints have been migrated to the new error format:

The documentation for each of those API endpoints contains a detailed description of the structure of this new format.

Upgrade notes

To upgrade a development instance, follow this documentation.

To upgrade a production instance, you need to:

  • Deploy this release’s Docker image: registry.gitlab.teklia.com/arkindex/backend:1.12.1

  • Run the database migrations: docker exec ark-backend arkindex migrate

  • Update the system workers: docker exec ark-backend arkindex update_system_workers

The main changes impacting developers and system administrators are detailed below.

PostgreSQL 15 compatibility

In the previous release, the configuration format database migration could fail when executed on versions of PostgreSQL below 17:

  • In PostgreSQL 15, executing the migration could result in the error 42601: subquery in FROM must have an alias.

  • In PostgreSQL 16, executing the migration could result in the error 42883: function json_table() does not exist.

In this release, the migration has been updated to restore compatibility with PostgreSQL 15, and the fix has also been backported onto the previous release. System administrators thus have three options when running Arkindex on PostgreSQL 15 or 16:

  • Migrating from an older release of Arkindex directly to 1.12.1, skipping 1.12.0 entirely;

  • Migrating from an older release of Arkindex to the Docker image tagged as 1.12.0-post1, which includes the backported migration;

  • Skipping this specific migration, which could cause data loss with workers using the older configuration format but bypasses the issue.

To prevent such mishaps from occurring in the future, we have also updated our automated testing to run all database migrations on all of the versions of PostgreSQL that we officially support.

Updates to database collations

The database migrations in this release include some changes to the Unicode collations used for the slugs of element types, worker types, workers and corpus categories.

These allow all slugs to become fully case-insensitive, but this could require some rows to be deduplicated first. For example, two element types with the slugs page and PAGE would be merged into a single page type, and all elements will need to be updated. The migrations will perform this deduplication automatically.

Removed commands

Some undocumented management commands have been removed in this release:

  • arkindex clone_process, now doable using the new API endpoint;

  • arkindex delete, equivalent to deleting an element using the web interface;

  • arkindex fake_worker_run, a precursor of user worker runs;

  • arkindex fake_worker_version, also a precursor of user worker runs;

  • arkindex merge_types, a tool to merge two element types into one that is no longer maintained;

  • arkindex migrate_workers, used to deduplicate workers once we introduced the concept of model versions in Arkindex 1.2.3;

  • arkindex move_lines_to_parent, used for specific projects internal to Teklia and which would now be implemented as a worker.