Agent Lifecycle

This document explains how Parallax discovers, spawns, manages, and cleans up AI coding agents throughout their lifecycle.

Overview

Agent Discovery

Agents can be discovered through multiple mechanisms:

1. etcd Registry

Production deployments use etcd for service discovery:

// Agent registers with etcd
await registry.register('agent', {
  id: 'claude-agent-1',
  name: 'Claude Engineer',
  endpoint: 'http://agent-1:8080',
  metadata: {
    type: 'claude',
    capabilities: ['implementation', 'typescript', 'testing'],
    expertise: 0.85,
  },
});

2. Local Agent Manager

For development, agents can be configured via environment variables:

# Configure local agents
export PARALLAX_LOCAL_AGENTS='[
  {
    "id": "local-claude",
    "name": "Local Claude",
    "endpoint": "http://localhost:8080",
    "capabilities": ["implementation", "code_review"]
  }
]'

3. Direct Registration

Agents can be registered directly with the Pattern Engine:

patternEngine.registerLocalAgents([
  {
    id: 'test-agent-1',
    name: 'Test Agent',
    address: 'http://localhost:9001',
    capabilities: ['testing'],
  },
]);

Agent Selection

When a pattern executes, agents are selected based on requirements:

Capability Matching

// Pattern requires these capabilities
pattern.agents = {
  capabilities: ['implementation', 'typescript'],
  minConfidence: 0.7,
};

// Agent selection filters by capabilities
agents = agents.filter(agent =>
  pattern.agents.capabilities.every(cap =>
    agent.capabilities.includes(cap)
  )
);

Agent Spawning

When not enough agents are available, Parallax can spawn new ones via the Agent Runtime Service.

Runtime Types

Runtime	How It Spawns	Use Case
Local	PTY process	Development
Docker	Container	Production (single host)
Kubernetes	Pod	Production (scaled)

Spawn Flow

Agent Configuration

interface AgentConfig {
  // Identity
  id?: string;                    // Auto-generated if not provided
  name: string;                   // Human-readable name
  type: AgentType;                // 'claude' | 'codex' | 'gemini' | 'aider'

  // Capabilities
  capabilities: string[];         // What this agent can do
  role?: string;                  // Org role: architect, engineer, etc.

  // Environment
  workdir?: string;               // Working directory
  env?: Record<string, string>;   // Environment variables

  // Credentials
  credentials?: {
    anthropicKey?: string;        // For Claude
    openaiKey?: string;           // For Codex
    googleKey?: string;           // For Gemini
    githubToken?: string;         // For repo access
  };

  // Resources (containerized runtimes)
  resources?: {
    cpu?: string;                 // e.g., "1" or "500m"
    memory?: string;              // e.g., "2Gi"
    timeout?: number;             // Max lifetime in seconds
  };
}

Docker Images

Parallax provides pre-built Docker images for each agent type:

Image	CLI Installed	Size
`parallax/agent-base`	Common tools	~200MB
`parallax/agent-claude`	Claude Code	~250MB
`parallax/agent-codex`	OpenAI Codex	~250MB
`parallax/agent-gemini`	Google Gemini	~250MB
`parallax/agent-aider`	Aider	~300MB

# Build all agent images
cd packages/runtime-docker
pnpm docker:build

# Run Claude agent manually
docker run -it --rm \
  -e ANTHROPIC_API_KEY=sk-... \
  -v $(pwd):/workspace \
  parallax/agent-claude

Agent Communication

Agents communicate via gRPC or HTTP:

gRPC Protocol

service Agent {
  rpc ExecuteTask(TaskRequest) returns (TaskResponse);
  rpc HealthCheck(HealthRequest) returns (HealthResponse);
  rpc StreamOutput(OutputRequest) returns (stream OutputChunk);
}

message TaskRequest {
  string task_id = 1;
  string description = 2;
  bytes data = 3;
  int32 timeout_ms = 4;
}

message TaskResponse {
  string task_id = 1;
  bytes result = 2;
  double confidence = 3;
  string reasoning = 4;
}

HTTP Protocol

POST /execute HTTP/1.1
Content-Type: application/json

{
  "taskId": "task-123",
  "description": "Implement user authentication",
  "data": { "requirements": "..." },
  "timeout": 30000
}

HTTP/1.1 200 OK
Content-Type: application/json

{
  "taskId": "task-123",
  "result": { "code": "...", "files": [...] },
  "confidence": 0.87,
  "reasoning": "Implemented OAuth2 flow..."
}

Agent Health Monitoring

The runtime continuously monitors agent health:

Health Check Response

interface HealthResponse {
  healthy: boolean;
  message?: string;
  runtime?: {
    name: string;
    type: string;
    activeAgents: number;
  };
}

Agent Cleanup

Agents are cleaned up after pattern execution or on shutdown:

After Execution

// Pattern Engine cleanup
private async cleanupSpawnedAgents(executionId: string): Promise<void> {
  const agents = this.spawnedAgents.get(executionId);
  if (!agents) return;

  for (const agent of agents) {
    try {
      await this.agentRuntimeService.stop(agent.id);
    } catch (error) {
      this.logger.warn({ agentId: agent.id }, 'Failed to stop agent');
    }
  }

  this.spawnedAgents.delete(executionId);
}

Graceful Shutdown

Container Cleanup (Docker)

// Docker runtime stops and removes containers
async stop(agentId: string, options?: StopOptions): Promise<void> {
  const container = this.containers.get(agentId);

  if (options?.force) {
    await container.kill();
  } else {
    await container.stop({ t: options?.timeout || 10 });
  }

  await container.remove();
  this.containers.delete(agentId);
}

Metrics and Observability

Agent Metrics

interface AgentMetrics {
  cpu: number;           // CPU usage percentage
  memory: number;        // Memory in bytes
  uptime: number;        // Milliseconds since start
  messageCount: number;  // Messages processed
}

Tracing

Agent operations are traced with OpenTelemetry:

Trace: pattern-execution
├── Span: select-agents
│   ├── Attribute: agent.count = 3
│   └── Attribute: capabilities = ["implementation"]
├── Span: spawn-agent (if needed)
│   ├── Attribute: agent.type = "claude"
│   └── Attribute: runtime = "docker"
├── Span: execute-task
│   ├── Attribute: agent.id = "claude-1"
│   ├── Attribute: confidence = 0.87
│   └── Attribute: duration_ms = 2500
└── Span: cleanup-agents

Logging

{
  "level": "info",
  "time": "2024-01-15T10:30:00.000Z",
  "msg": "Agent spawned for pattern execution",
  "agentId": "exec-123-agent-0",
  "patternName": "code-review",
  "runtime": "docker",
  "type": "claude"
}

Best Practices

1. Pre-warm Agents

For latency-sensitive workloads, keep agents running:

# Keep minimum agents ready
agents:
  warmPool:
    claude: 2
    aider: 1

2. Use Appropriate Runtimes

Scenario	Recommended Runtime
Local development	Local
CI/CD pipelines	Docker
Production (single node)	Docker
Production (scaled)	Kubernetes

3. Set Resource Limits

Prevent runaway agents:

{
  resources: {
    cpu: "1",
    memory: "2Gi",
    timeout: 300  // 5 minute max
  }
}

Agents may require authentication:

agentRuntimeService.on('login_required', async (agent, url) => {
  // Option 1: Provide credentials
  await agent.provideCredentials({ apiKey: '...' });

  // Option 2: Notify user
  notifications.send(`Agent ${agent.name} requires login: ${url}`);
});

Next Steps

Workspace Service - Git workspace provisioning
Agent Runtimes - Runtime configuration
Docker Images - Building and customizing images

Overview​

Agent Discovery​

1. etcd Registry​

2. Local Agent Manager​

3. Direct Registration​

Agent Selection​

Capability Matching​

Agent Spawning​

Runtime Types​

Spawn Flow​

Agent Configuration​

Docker Images​

Agent Communication​

gRPC Protocol​

HTTP Protocol​

Agent Health Monitoring​

Health Check Response​

Agent Cleanup​

After Execution​

Graceful Shutdown​

Container Cleanup (Docker)​

Metrics and Observability​

Agent Metrics​

Tracing​

Logging​

Best Practices​

1. Pre-warm Agents​

2. Use Appropriate Runtimes​

3. Set Resource Limits​

4. Handle Login Gracefully​

Next Steps​