Arkindex 1.6.2

We are happy to announce that a new Arkindex release is available. You can explore Arkindex and try out the newest features on our demo instance, demo.arkindex.org.

Datasets

A new Populate dataset action on projects and elements allows to populate an empty dataset with a random sample of elements, without needing to use the Arkindex CLI. It provides the same options as the CLI’s ml-splits command.

Populate dataset button
Populate dataset modal

In addition, datasets in a Complete or Error state can now be re-opened using the Reopen button on the dataset details page. This removes any generated artifacts and allows to edit the dataset before building it again.

Reopen dataset button
Reopen dataset confirmation modal

Finally, the default dataset set names have been updated to train, dev and test, and are now consistent between the frontend, the API and the CLI.

Processes

Several improvements have been made to multiple aspects of process execution and management in this release.

GPU management

Whether or not a GPU will be used is now configurable on each worker in a process, instead of the whole process. Workers that require a GPU will always use a GPU, and workers that do not support a GPU will not. For workers that support GPU usage, but do not require it, users are now free to choose whether or not to use one.

GPU usage on WorkerRuns

Worker configurations

When creating a worker configuration, only the required fields are now displayed by default. This makes workers with a large amount of options more user-friendly, by letting users focus on what they need to do to run the worker without having to understand every advanced option.

config_required_fields

Additionally, configuration fields that allow to select a model, mainly used for training workers, now use a modal instead of a text field with a list of suggestions. This makes browsing the models easier and solves some user interface bugs, particularly with large configurations.

Task execution

Various issues have been fixed on tasks running in Community Edition:

  • The Restart task feature, which runs a new task without running the whole process again, now properly runs the new task.

  • Stopping a task marks it as Stopped and not Failed.

  • Tasks now only start after any selected model versions are fully downloaded, rather than before.

Additionally, the API endpoints used to manage tasks have been simplified. The RetrieveTaskFromAgent endpoint has been renamed RetrieveTask, and UpdateTaskFromAgent and PartialUpdateTaskFromAgent have been merged into the existing UpdateTask and PartialUpdateTask endpoints.

Misc

  • When restarting a task with the Restart task button, the original task that got restarted is now displayed:

Restarted task display
  • The Restart task button is now disabled when a task has already been restarted. You will need to restart the newer task instead.

  • In Enterprise Edition, restarting a task now creates a task without an associated agent and GPU, allowing the task to be assigned to any other agent in the same farm.

  • Process names can now be up to 250 characters long, and errors are now properly displayed when they occur while renaming a process.

  • Creating a process from failed worker activities now runs asynchronously, allowing to create processes from a much larger amount of failures.

Imports

IIIF imports have now been merged into file imports. It is now possible to import images, PDF files, Transkribus collection exports, IIIF manifests, and archives containing any of those, all at once in the same process.

Continuing on our work to convert internal Arkindex tasks to workers, S3 imports now run using a separate worker. The upgrade notes contain some notes about this change for instance administrators.

User management

The profile page has been updated to allow editing your display name and changing your password. The API token is also hidden by default to prevent any leaks during screenshares.

New profile page

The registration and email verification process has been improved. Users that did not receive the verification email now have the option to send a new one. When clicking the confirmation link in an email, errors are now displayed more clearly.

Finally, users that have been registered without a password through the API, and thus cannot login normally, get a warning and an invite to set their password through the new profile page.

Cleanup

We have continued our efforts to improve consistency and remove deprecated features from Arkindex:

  • The long-deprecated worker version IDs have been removed from the APIs and from SQLite exports. This means that old Machine Learning results created before Arkindex 1.4.0 will now appear as if they were created manually, instead of by a worker version.

  • Worker versions no longer have Docker image artifacts associated with them. These were only used by Git imports, which had been removed in Arkindex 1.6.0.

  • Git repositories no longer have access rights associated with them. Any existing access rights have been transferred to every worker linked to repositories.

  • Classification confidences are no longer optional. Any classification without a confidence set now has its confidence score set to 1.

  • The unique identifier for transcription entities is now an UUID rather than an integer, to be consistent with every other Machine Learning result.

CLI

Misc

  • The ListElements, ListElementChildren and ListElementParents API endpoints now provide a with_transcriptions option, allowing all transcriptions on each element to be fetched similarly to with_metadata or with_classes.

  • The Delete worker results action now also deletes entities when it runs on a project.

  • Project exports that fail because of a database connection issue should now be properly shown as Failed rather than still be shown as Running.

  • The feedback button in the footer has been replaced by a link to our new support forum.

  • IIIF image checks now use an Arkindex-specific user agent string (Arkindex/1.6.2 (+https://teklia.com/)) rather than a generic Requests one. This can solve issues when importing images from servers that block web scraping.

Upgrade notes

To upgrade a development instance, follow this documentation.

To upgrade a production instance, you need to:

  • Deploy this release’s Docker image: registry.gitlab.teklia.com/arkindex/backend:1.6.2

  • Run the database migrations: docker exec ark-backend arkindex migrate

The main changes impacting developers and system administrators are detailed below.

S3 import worker

S3 import processes now use a worker instead of an internal Arkindex task. In order to be able to start S3 imports on your Arkindex instance, you now need to:

  • Set the docker.ingest_image setting in your configuration file: it defaults to registry.gitlab.teklia.com/arkindex/workers/import/s3:latest but specifying a tag is recommended, as latest does not guarantee stability.

  • Create the corresponding worker version on your Arkindex instance. If no worker version exists on the instance with the Docker image set in settings, when starting up the backend the system checks will display a warning. You can do this using the worker version publishing command in the CLI, or through the frontend:

    • Go to the workers list page, by clicking on Workers in the user dropdown menu (where your email address is displayed in the main navbar). Use the Create button to create a new worker. You can, for example, name it Elements Initialisation Worker, and set the type as init_worker.

    • Select your new worker in the workers list, and from the versions list on the right, use the Create button to create a new version. Set the Docker image from your settings as the Docker image reference, and leave the configuration empty.

If you do not have an S3 import worker version correctly set on your instance, you will not be able to launch any S3 import processes.

The current recommended docker.ingest_image setting is:

registry.gitlab.teklia.com/arkindex/workers/import/s3:0.1.0

Timeout for sending verification emails

The account verification email is now sent via an asynchronous task. A new job_timeouts.send_verification_email setting is available. It defaults to 120 seconds.

Timeout for creating a process from Worker Activity failures

The CreateProcessFailures endpoint now creates the process in an asynchronous task, and notifies the user by email when it is available. A new job_timeouts.create_process_failures setting is available. It defaults to 3600 seconds.

Export queues

Arkindex 1.6.1 introduced the export Redis queue. In this release, we split exports between two distinct queues depending on their source, i.e. the database they are generated from.

The export Redis queue is now only used by the Enterprise Edition of Arkindex. If you are using the Community Edition, then your exports will run whether or not the export queue has assigned workers.

For Arkindex instances using the Enterprise Edition:

  • You can set up a dedicated database to run project exports from, using the database.export setting.

  • If you have set up such a database, then the exports made from this database will run on the export Redis queue. Exports created from the main database will run on the high Redis queue. WARNING: Exports created from the export database can only run if you assign workers to the export queue. For Docker Compose-based deployments, see our sample docker-compose.yml.

IIIF user agent setting

A new iiif_user_agent setting is available. You can use it to specify the User-Agent header that is being sent when checking images. This can help when accessing IIIF servers that block bots based on their user agent.

Doorbell removal

The Doorbell integration has been removed from the frontend. The doorbell.id and doorbell.appkey settings in the backend configuration, which were passed to the frontend to enable this integration, have been removed. If they are still set in your YAML configuration, it will not cause any errors or warnings, but you can now remove them.

Support requests may now be filed on our forum instead.