Lifecycle - Vast.ai mode

The lifecycle of a Ponos agent in Vast.ai mode is similar to that of the generic Ponos agent. Only the actions specific to task management differ.

Setup

In addition to the generic Ponos agent setup, the Ponos agent in Vast.ai mode will:

Loop

During its loop, the ponos agent running in Vast.ai mode will:

  • list existing instances using the show instances API call,

  • order a new instance for each task, applying the configured offer filters to find the best & cheapest offer possible,

  • check upon the running instances,

  • report their state onto Arkindex,

  • read instance logs and publish them on Arkindex S3 storage,

  • stop instances

The agent does not download any artifact of Machine Learning model, as it cannot write them on the remote instance’s disk.

Vast.ai instances

Vast.ai instances are used to run ponos tasks, using the same Docker images as when the agent runs in docker mode.

Each instance created by the ponos agent has these settings:

  • one GPU (and only one) is always present

  • the same environment variables as for docker mode are provided to the container, along with the ones provided by Vast.ai

  • we also set TASK_BOOTSTRAP & TASK_PARENTS (see below for details)

  • there is no data persistence, so the workers need to publish their artifacts themselves

Bootstrapping instances

As the ponos agent cannot write on the remote instance’s disk, it’s the sole responsibility of the worker to download parent task artifacts and Machine Learning model before running any actual business code.

To do so, the ponos agent provides two extra environment variables:

  • TASK_BOOTSTRAP=true, to trigger the specific local bootstrap mode

  • TASK_PARENTS is a json mapping of Arkindex Task IDs (as keys) and the corresponding task slugs (as values) so that the worker can build a local file system with all parent task artifacts.

All this bootstrapping is done by base-worker >= 0.5.3.