Lifecycle
The lifecycle of a Ponos agent has two main stages:
-
Setup
which is done once at the beginning, -
Loop
which is repeated infinitely until the Ponos agent is stopped.
Setup
When a Ponos agent starts, it will set up its environment once at the beginning. The Ponos agent will:
-
Set up Sentry (using the
sentry
parameter of its configuration). -
Set up the Arkindex client to use (using the
url
parameter of its configuration). -
Set up the logging (using the
logging
parameter of its configuration). -
Check that there is no other Ponos agent on the same host. It uses a unique file (from the
pid_file
parameter of its configuration) containing the PID of the Ponos agent currently running. If this file contains a PID other than that of the current Ponos agent and corresponds to a running program, the Ponos agent stops. -
Mark its presence on the host. It writes its PID to the unique file (from the
pid_file
parameter of its configuration). -
Create a folder (using the
data_dir
parameter of its configuration) to store various files later to enable Arkindex tasks to be processed. -
Register to Arkindex (using the
CreateAgent
endpoints and thefarm_id
andseed
parameters of its configuration). -
List the tasks running on the host.
Loop
Once the Ponos agent’s setup is complete, it will loop infinitely to track the processing of its tasks and synchronize their state with Arkindex state. At each loop it will:
-
Check whether the Ponos agent is ready. By default, the Ponos agent is always ready, but this condition depends on the type of Ponos agent used.
-
Check running tasks on the host. For each task, the Ponos agent will:
-
Check that the task is still running. If the Ponos agent cannot find the task, it will update its state to
Error
(using thePartialUpdateTask
endpoint). The task will no longer be listed as a running task. -
Upload the task’s logs using the associated S3 URL.
-
Check whether the task is finished. If the task is finished, it will update its state to
Completed
orFailed
according to its exit code (using thePartialUpdateTask
endpoint) and upload its artifacts (using theCreateArtifact
endpoint). The task will no longer be listed as a running task.
-
-
Retrieve the list of actions (using the
RetrieveAgentActions
endpoint) and process each action.
Action
A task can be either started or stopped.
Start task
To start a task, the Ponos agent will:
-
Check that the task is not already running. If the task is already running, the Ponos agent will ignore the action.
-
Check that the task is correctly assigned. If the task is assigned to another Ponos agent, the Ponos agent will ignore the action.
-
Download task’s artifacts (artifacts of parent tasks) (using thehttps://arkindex.teklia.com/api-docs/#tag/ponos/operation/RetrieveTaskDefinition[
RetrieveTaskDefinition
] and theListArtifacts
endpoints), task’s extra files (like models) and store them in a specific folder (using thedata_dir
parameter of its configuration). -
Start the task according to the type of Ponos agent used.
-
Update the task’s state to
Running
(using thePartialUpdateTask
endpoint). -
Add the task to its list of running tasks.
If an error occurs during any of the above steps, the Ponos agent will update the task’s state to Error
(using the PartialUpdateTask
endpoint).
Stop task
To stop a task, the Ponos agent will:
-
Check that the task is not running. If the task is still running, it will stop it. The task will no longer be listed as a running task.
-
Update task’s state to
Stopped
(using thePartialUpdateTask
endpoint).
Generic Ponos agent
The Ponos agent lifecycle described above is as generic as possible. We have currently implemented two agents for that lifecycle: using Docker engine for standard hardware, and Slurm for super-computers.