Local Agent Orchestration Guide

Use this guide when you want to run planner-worker orchestration on one machine first, validate behavior quickly, and exercise the same scripted planner and isolated-worker patterns you can later carry into broader multi-pod or cloud gates.

What You Will Run

Component	Purpose
`nfltr mcp`	Planner-side MCP bridge for discovery, capacity checks, explicit worker choice, task control, and recovery reads.
`nfltr worker`	Worker agent that exposes labels and capacity, then executes task calls via a local MCP command.
`cmd/tools/mcp-stdio-call`	Optional scripted stdio driver that keeps one planner session alive across multiple MCP tool calls.
Local task store	Persist planner task snapshots and events for local restart and replay behavior.
Dashboard stack (optional)	Operator-facing validation of managed capabilities and route state.

Local-first boundary

For one planner process on one machine, local task storage is enough for normal orchestration cycles. Shared persistence is only needed when planners restart under supervisors, move hosts, or need cross-process task inspection.

Prerequisites

Docker and Docker Compose
Repo checkout with make targets available
Built binaries when running commands directly (make build)

Path A: Fast Local Orchestration Validation

Run the standalone orchestration smoke sweep:

make smoke-tests-orchestration

This validates local planner-worker orchestration flows including:

task dispatch and status transitions
worker preflight and capacity behavior
resume/recovery paths covered by local orchestration smoke targets

If you want the complete local verification gate (lint, tests, and repeated orchestration smokes):

make verify-orchestration

If you want one scripted planner session instead of an interactive MCP client, run:

cat <<'EOF' | go run ./cmd/tools/mcp-stdio-call \
  --command "./bin/nfltr mcp --proxy-url https://nfltr.xyz"
{"tool":"list_worker_peers","arguments":{}}
EOF

Path B: Local Dashboard Orchestration Workflow

Start the local dashboard capabilities stack:

docker compose -f docker/docker-compose-smoke-dashboard-capabilities-dual-postgres.yaml \
  up --build -d postgres migration-job mock-oauth rpc-server-1 rpc-server-2 backend-1 backend-2 homelab-agent

Open the dashboard:

http://localhost:18080/dashboard
Sign in through the mock OAuth flow
Use Manage on an endpoint to stage capability intents

You can also run the same end-to-end dashboard capability check through the repo target:

make smoke-test-dashboard-capabilities-dual-postgres

Planner Preflight Pattern (Recommended)

Before fan-out or verification loops, keep local orchestration deterministic with this advisory sequence:

Run list_worker_peers to gather candidate workers and inspect their labels, tool name, and advertised slots.
Run ensure_worker_capacity to confirm enough ready workers are available for the plan you are about to dispatch.
Run probe_worker when live capacity matters before dispatching a more expensive task.
If there is shortage, run propose_managed_worker_intents to generate advisory payloads for manual approval and explicit upsert.

Advisory by default

The planner decides which worker should run a task. Capacity checks and intent proposals stay advisory until you explicitly execute managed-capability writes. Use rank_worker_peers only when you want one reusable compatibility ranking view, not as the only mainline worker-selection path.

Path C: Local Repo Improvement Workflow

For the most concrete local-first workflow, keep the coordinator checkout and worker checkouts separate:

one coordinator clone where you inspect returned patches and rerun final verification
one isolated clone per worker so the worker never writes directly into your main checkout
one planner session driven through cmd/tools/mcp-stdio-call

For the full repo-focused walkthrough, use Local Repo Improvement with NFLTR Workers. It expands this path into the complete safety model, NDJSON planner sequence, patch inspection rules, and coordinator-side verification loop.

Start a worker on an isolated clone with git-backed patch return enabled:

AGENT_ID=repo-ux-worker-a \
./bin/nfltr worker \
  --name repo-ux-worker-a \
  --server grpc.nfltr.xyz:443 \
  --api-key "$NFLTR_API_KEY" \
  --labels "pool=repo-ux,track=docs,role=implementer,non_mock=true" \
  --max-tasks 1 \
  --mcp-command "./bin/nfltr copilot-mcp --cwd /path/to/worker-a --git-code-result --timeout 20m --copilot-arg=--stream=off"

Then drive one planner session with the checked-in NDJSON template:

go run ./cmd/tools/mcp-stdio-call \
  --command "./bin/nfltr mcp --proxy-url https://nfltr.xyz" \
  --stderr-file /tmp/local-repo-improvement.stderr \
  < cmd/tools/mcp-stdio-call/examples/local-repo-improvement.ndjson \
  > /tmp/local-repo-improvement.results.ndjson

That flow sends a repo archive over artifact_files, waits for a terminal task state, and then lets you apply the returned patch from result.artifacts[0].path or the fallback result.result.patch_path in the coordinator clone.

If you want the full same-VM example, run scripts/smoke-test-mcp-isolated-workers.sh. It launches two isolated workers, dispatches two implementation tasks through one planner session, applies the returned patch artifacts to a coordinator clone, and reruns go test ./... locally.

Troubleshooting

Symptom	Likely cause	Action
Live orchestration tools fail fast	Planner bridge is running without a usable A2A connection	Confirm `--server` target and API key, then retry `list_worker_peers`.
`orchestrate_task` is rejected before dispatch	The planner did not pass a concrete `agent_id` or ordered `candidate_agent_ids`	Pick the worker explicitly after `list_worker_peers`, `ensure_worker_capacity`, and `probe_worker`.
No workers appear ready	Label mismatch or zero available slots	Inspect discovery and capacity output, then adjust labels or worker concurrency.
Task completes without a usable patch path	The worker returned prose instead of a git-backed patch artifact	Tighten `allowed_paths`, keep `require_changed_paths=true`, and ask for a git-backed patch result explicitly.
Dashboard capability save appears stale	Agent is offline and intent is staged only	Bring the managed agent online; staged intents reconcile on poll.

Teardown

docker compose -f docker/docker-compose-smoke-dashboard-capabilities-dual-postgres.yaml down -v