Specification

Rules

  • Each arkindex worker must be defined by a single YAML file.

  • These files must be set in an arkindex folder, at the root of the Git repository hosting the worker code.

  • These YAML files must be fully self-contained: no links towards other files are supported.

Base parameters

A worker configuration must contain the following information.

Table 1. Base parameters
Key Required Description Limitations

slug

Yes

Unique name of your worker

Only alphanumeric characters, dashes and underscores can be used.

display_name

Yes

Display name of your worker

Not necessarily unique, can use spaces.

type

Yes

Worker category, not limited to specific choices. This is used to filter workers in the Arkindex interface.

Free-form text

description

Yes

Text describing your worker features. Markdown is supported.

Free-form text

gpu_usage

No

Specify if a GPU is needed to run the worker

Allowed values are

  • disabled

  • supported

  • required

Default value is disabled

model_usage

No

Specify if a Machine Learning model provided by Arkindex is needed to run the worker

Allowed values are

  • disabled

  • supported

  • required

Default value is disabled

Example

slug: my_worker
name: My wonderful worker
type: transcriber
gpu_usage: supported

description: >
  Explain here why this is a wonderful worker

Docker parameters

A worker configuration using Docker execution should contain the following information. If the section is omitted, the default values are used.

Key Description Default

docker.command

Command to execute when starting the worker

The default CMD from the Dockerfile

docker.shm_size

Size of /dev/shm when starting the worker. The format is <number><unit>. number must be greater than 0. Unit is optional and can be b (bytes), k (kilobytes), m (megabytes), or g (gigabytes). If you omit the unit, the system uses bytes.

64m

Example

docker:
  command: my-worker

Configuration options

Finally, the configuration file can have configuration options that can be viewed and edited by the end-user when starting a worker in Arkindex.

All these options are listed under an optional configuration key.

Table 2. Content of one configuration option
Key Description Required

key

Unique name of the configuration option. That name will be provided to the worker code when running.

Yes

display_name

Free-form name explaining the role of the option to the end-user. Should be one line.

Yes

help_text

Optional text explaining that option in more details. Can be multiple lines.

No

type

Type of the key, amongst these values:

  • bool

  • dict

  • element_type

  • enum

  • float

  • group

  • int

  • model

  • secret

  • string

  • text

  • worker_version

See details below.

Yes

default

Specify a default value for this option.

Required when not editable. When many is set to true, the default value must also be a list.

required

Is that option required to be set by the end-user? Default to not required.

No

editable

Is that option editable by the end-user? Defaults to true. When an option is not editable, the option will always be set to its default value.

No

choices

A list of possible choices when type is enum

Required (with at least 2 values) when type is enum. An error will be raised when this is specified for other types.

children

A list of configuration options with this structure, when type is group

Required (with at least 2 values) when type is group An error will be raised when this is specified for other types.

many

Specify if the end-user can specify multiple items of that type. The worker will get a list of these items. Defaults to false. Supported types are:

  • int

  • float

  • string

  • worker_version

  • element_type

No

Types

All the available configuration types are detailed below.

bool

This is a boolean value, the only possible values are true or false.

Example
configuration:
  - key: publish_transcriptions
    display_name: Publish generated transcriptions
    type: bool
    default: true

dict

The dict type only contains string keys and string values. You can store numerical values but they will be provided as strings to the workers.

Example
configuration:
  - key: labels
    display_name: Mapping to convert classification labels into human-readable output
    type: dict
    default:
      "0": "Class A"
      "1": "Class B"
      "2": "Class C"

element_type

This allows the end-user to select an Arkindex Element Type from the project they are running a process from.

The type will be provided to the worker as its slug (for example, the worker will receive a page as value and not an UUID).

No default is allowed here.

Example
configuration:
  - key: page_type
    display_name: Select the element type representing a single page
    type: element_type

enum

The enum type allows to pick a string amongst a known list of choices.

When that type is used, the choices parameter must be a list of strings. Only one of these values will be accepted.

Example
configuration:
  - key: color_mode
    display_name: Color mode to open the image
    type: enum
    choices:
      - RGB
      - RGBA
      - CMYK

float

This type represents a numerical value using single-precision floating-point format.

This however does not support NaN nor ±Infinity notations (ie. no .nan, .inf, +.inf or -.inf are allowed).

Example
configuration:
  - key: threshold_x
    display_name: Some acceptance threshold
    type: float
    default: 1.3

group

This type allow developers to group parameters together, so that they make more sense to the end-user. They will be displayed in a dedicated section, whose title is the group display_name.

The child options in a group can be of any type mentioned on this page, the same rules apply.

The values provided to the worker will be named as {key of group}.{key of child option}. For example, if you have a group with key demo and a child option keyed option_x, the worker will receive a demo.option_x value.

The group type has plenty of special behaviors:

  • the children must have at least one item;

  • it does not support directly default, required nor editable. Only its children options can have this options set;

  • only one level of children is possible; You cannot have a group inside a group;

  • if there is a key conflict against another option (in the group or in any option from the worker), the configuration will be invalid and will not be usable.

Example
configuration:
  - key: resize
    display_name: Resizing options
    type: group
    children:
      - key: ratio_x
        display_name: Ratio on X-axis
        type: float
      - key: ratio_y
        display_name: Ratio on Y-axis
        type: float

int

This type represent a numerical value using integer format.

Values can be positive or negative.

Example
configuration:
  - key: nb_layers
    display_name: Number of layers to process
    type: int
    default: 256

model

This type allow a user to select an available Arkindex model (and not a model version !). This is generally useful to publish a new version after Machine Learning training or fine tuning.

No default is allowed here.

Example
configuration:
  - key: target_model
    display_name: Arkindex model that will receive the new version trained
    type: model

secret

This type allow a user to select an available Arkindex secret. The worker will receive the name of the secret.

No default is allowed here.

Example
configuration:
  - key: nb_layers
    display_name: Number of layers to process
    type: secret

string and text

Both of these types (string and text) represent strings. The only difference is that text type is displayed to the end-user as a larger text box than when using string.

The HTML elements used in the Arkindex frontend are: - <input type="text" /> for string - <textarea /> for text

Example
configuration:
  - key: llm_prompt
    display_name: Magical LLM Prompt that will solve everything
    type: text
    default: Why is the sky blue?

worker_version

This allows the end-user to select an Arkindex Worker Version. Only worker versions from accessible workers will be usable by the end-user.

The worker version selected will be provided to the worker as its ID (UUID format).

No default is allowed here.

Example
configuration:
  - key: transcription_worker
    display_name: Worker version that generated the transcriptions to process
    type: worker_version

Full Example

slug: my_worker
display_name: My wonderful worker
type: transcriber
gpu_usage: supported
model_usage: required

description: >
  Explain here why this is a wonderful worker

docker:
  command: my-worker

configuration:
  - key: fixed_height
    display_name: Fixed image height
    type: int
    default: 128
    editable: false

  - key: use_language_model
    display_name: Use a language model if available
    type: bool
    required: false
    default: false

  - key: some_ratio
    type: float
    default: 3.68
    display_name: Magic ratio

  - key: line_element_type
    type: string
    default: text_line
    display_name: Line element type

  - key: some_list
    display_name: An example list of strings
    type: string
    many: true

  - key: scale
    type: group
    display_name: Rescale options
    children:
      - key: x
        display_name: Rescale the image on the X axis
        type: float
        default: 1.0
      - key: y
        display_name: Rescale the image on the Y axis
        type: float
        default: 1.0

  - key: color_mode
    display_name: Polygon extraction parameter | Color mode of the images
    required: true
    default: L
    type: enum
    choices:
      - L
      - RGB
      - RGBA

  - key: openai_token
    display_name: OpenAI secret key
    type: secret