Lifecycle

The lifecycle of a Ponos agent has two main stages:

Setup which is done once at the beginning,
Loop which is repeated infinitely until the Ponos agent is stopped.

graph TB start[Start Ponos agent] --> setup setup --> loop subgraph setup[Setup] direction LR setup_tools[Setup tools] --> check_pid{Other Ponos agent running?} check_pid --> |Yes| exit[Exit] check_pid --> |No| write_pid[Mark Ponos agent as running] write_pid --> register[Register Ponos agent on Arkindex] register --> list_existing_tasks[List running tasks] end subgraph loop[Loop] direction LR ready{Ready?} --> |No| ready ready --> |Yes| check_tasks subgraph check_tasks[Check running tasks] direction TB upload_logs[Upload task's logs] --> task_state{Task's state?} task_state --> |Finished| update_state_finished[Update task's state to `Completed` or `Failed`] end check_tasks --> get_actions[Retrieve actions from Arkindex] get_actions --> action{Action?} action --> action_start[Start Task] action --> action_stop[Stop Task] subgraph action_start[Start task] direction TB download_files[Download files] --> start_task[Start task] start_task --> update_state_running[Update task's state to `Running`] end subgraph action_stop[Stop task] direction TB stop_task[Stop task] --> update_state_stopped[Update task's state to `Stopped`] end end

View full-size image

Setup

When a Ponos agent starts, it will set up its environment once at the beginning. The Ponos agent will:

Set up Sentry (using the sentry parameter of its configuration).
Set up the Arkindex client to use (using the url parameter of its configuration).
Set up the logging (using the logging parameter of its configuration).
Check that there is no other Ponos agent on the same host. It uses a unique file (from the pid_file parameter of its configuration) containing the PID of the Ponos agent currently running. If this file contains a PID other than that of the current Ponos agent and corresponds to a running program, the Ponos agent stops.
Mark its presence on the host. It writes its PID to the unique file (from the pid_file parameter of its configuration).
Create a folder (using the data_dir parameter of its configuration) to store various files later to enable Arkindex tasks to be processed.
Register to Arkindex (using the CreateAgent endpoints and the farm_id and seed parameters of its configuration).
List the tasks running on the host.

Loop

Once the Ponos agent’s setup is complete, it will loop infinitely to track the processing of its tasks and synchronize their state with Arkindex state. At each loop it will:

Check whether the Ponos agent is ready. By default, the Ponos agent is always ready, but this condition depends on the type of Ponos agent used.
Check running tasks on the host. For each task, the Ponos agent will:
- Check that the task is still running. If the Ponos agent cannot find the task, it will update its state to Error (using the PartialUpdateTask endpoint). The task will no longer be listed as a running task.
- Upload the task’s logs using the associated S3 URL.
- Check whether the task is finished. If the task is finished, it will update its state to Completed or Failed according to its exit code (using the PartialUpdateTask endpoint) and upload its artifacts (using the CreateArtifact endpoint). The task will no longer be listed as a running task.
Retrieve the list of actions (using the RetrieveAgentActions endpoint) and process each action.

Action

A task can be either started or stopped.

Start task

To start a task, the Ponos agent will:

Check that the task is not already running. If the task is already running, the Ponos agent will ignore the action.
Check that the task is correctly assigned. If the task is assigned to another Ponos agent, the Ponos agent will ignore the action.
Download task’s artifacts (artifacts of parent tasks) (using theapi:RetrieveTaskDefinition[] and the ListArtifacts endpoints), task’s extra files (like models) and store them in a specific folder (using the data_dir parameter of its configuration).
Start the task according to the type of Ponos agent used.
Update the task’s state to Running (using the PartialUpdateTask endpoint).
Add the task to its list of running tasks.

If an error occurs during any of the above steps, the Ponos agent will update the task’s state to Error (using the PartialUpdateTask endpoint).

Stop task

To stop a task, the Ponos agent will:

Check that the task is not running. If the task is still running, it will stop it. The task will no longer be listed as a running task.
Update task’s state to Stopped (using the PartialUpdateTask endpoint).

Generic Ponos agent

The Ponos agent lifecycle described above is as generic as possible. We have currently implemented two agents for that lifecycle: using Docker engine for standard hardware, and Slurm for super-computers.