Lifecycle - Vast.ai mode
The lifecycle of a Ponos agent in Vast.ai mode is similar to that of the generic Ponos agent. Only the actions specific to task management differ.
Setup
In addition to the generic Ponos agent setup, the Ponos agent in Vast.ai mode will:
-
Connect to the Vast.ai API (using the
vast_api.api_keyparameter of its configuration). -
Verify your
vast_api.offer_filtersare supported
Loop
During its loop, the ponos agent running in Vast.ai mode will:
-
list existing instances using the show instances API call,
-
order a new instance for each task, applying the configured offer filters to find the best & cheapest offer possible,
-
check upon the running instances,
-
report their state onto Arkindex,
-
read instance logs and publish them on Arkindex S3 storage,
-
stop instances
The agent does not download any artifact of Machine Learning model, as it cannot write them on the remote instance’s disk.
Vast.ai instances
Vast.ai instances are used to run ponos tasks, using the same Docker images as when the agent runs in docker mode.
Each instance created by the ponos agent has these settings:
-
one GPU (and only one) is always present
-
the same environment variables as for docker mode are provided to the container, along with the ones provided by Vast.ai
-
we also set
TASK_BOOTSTRAP&TASK_PARENTS(see below for details) -
there is no data persistence, so the workers need to publish their artifacts themselves
Bootstrapping instances
As the ponos agent cannot write on the remote instance’s disk, it’s the sole responsibility of the worker to download parent task artifacts and Machine Learning model before running any actual business code.
To do so, the ponos agent provides two extra environment variables:
-
TASK_BOOTSTRAP=true, to trigger the specific local bootstrap mode -
TASK_PARENTSis a json mapping of Arkindex Task IDs (as keys) and the corresponding task slugs (as values) so that the worker can build a local file system with all parent task artifacts.
All this bootstrapping is done by base-worker >= 0.5.3.