System workers

System workers are Workers that provide certain essential Arkindex features, such as file imports and exports and elements initialization in processes. To distinguish them from other workers, we use a system worker type.

The purpose of system workers is to provide Feature worker versions, which are worker versions assigned to an Arkindex feature.

The following Arkindex features are available:

init_elements

The elements initialization that is performed at the beginning of every inference process. It produces a list of all the elements in the process that is then used by the workers.

file_import

All local file imports.

s3_ingest

Large file imports from S3 buckets.

thumbnails_generator

Generates thumbnails for all folders that were created during a file import.

dataset_extractor

Prepares a dataset for use in a training process.

There also are features for each document export mode:

  • pdf_export

  • pagexml_export

  • docx_export

  • csv_export

Feature worker versions

All worker versions support a feature field, and a worker version becomes a feature worker version when it is assigned one of these features.

There can only be one worker version for each Arkindex feature. If you want to manually assign a new worker version to a feature, you must first unassign the current feature worker version.

A feature worker version must:

  • be in the available state.

  • not require either GPU or model feature usage.

  • have no required fields in its user configuration: it must be possible to run feature worker versions without setting any parameter. This means that these worker versions' configurations must have default values for all their items.

System administrators must take care to keep the feature worker versions up-to-date when updating Arkindex instances. See the system workers section of the deployment documentation.

Feature worker versions get updated automatically when Arkindex instances are upgraded. Instance administrators can update feature worker versions from the administration interface, but manual changes will be overwritten by these automated updates.

The source code for the workers providing these Arkindex features is available on GitLab.

The worker version that provides a feature can be retrieved directly using the RetrieveFeatureWorkerVersion API endpoint.

Hidden features

All but the thumbnails_generator and dataset_extractor features are marked as hidden. A hidden feature means that any worker that has a worker version that provides this feature will be hidden from the workers list and from the ListWorkers API endpoint.

No other parts of Arkindex are affected by a feature being hidden. For example, it is still possible to access a worker with a hidden feature from RetrieveWorker.