Orchestration — Control Tower Guide

The dashboard now pairs a Control Tower Snapshot (fleet readiness and capability coverage) with My Active Orchestrations and DAG run explorer panels. It shows live task state pulled from each connected planner, and action buttons queue commands the planner picks up within a couple of seconds.

How it fits together

The planner runs locally as nfltr mcp. It pushes a digest of every task it owns to nfltr.xyz every ~5 seconds, and it polls the relay for commands every ~2 seconds. The dashboard never runs an orchestrator itself — it reads the digest and enqueues commands. This means the dashboard works against any planner that can reach the relay, including planners behind NAT or on your laptop.


1. Connect A Planner

Install the CLI and start a planner with an API key scoped to your account:

curl -fsSL https://nfltr.xyz/install.sh | sh
export NFLTR_API_KEY="..."   # from dashboard → Settings → API Keys

./bin/nfltr mcp \
  --proxy-url https://nfltr.xyz \
  --api-key "$NFLTR_API_KEY"

Once the planner is up, the dashboard's My Active Orchestrations card flips from "Waiting for first digest…" to live. If you do not see anything within ten seconds, check that the planner reached the relay (look for orchestration command poll failed warnings in the planner log).

Workers register the same way but with the nfltr worker subcommand and a label set:

./bin/nfltr worker \
  --name repo-fix-worker-a \
  --server grpc.nfltr.xyz:443 \
  --api-key "$NFLTR_API_KEY" \
  --labels "role=implementer,pool=repo-fix" \
  --max-tasks 1 \
  --mcp-command "./bin/nfltr copilot-mcp --cwd /path/to/worker-clone --git-code-result --timeout 20m"

2. Launch Your First Task

In My Active Orchestrations, click + Start. A small inline form opens:

When you click Start, the dashboard generates a client-side task id (prefixed dash-…) and queues a start_task command. Within ~2s the planner picks it up, calls orch.StartTask, and the new task appears in the live view at the bottom of the card.

Why a client-side id?

So you can queue follow-up commands (approve, answer) targeting the same task before the first digest tick lands. The planner uses that exact id when it calls StartTask, so the next command's task_id always resolves.


3. Read The Live View

Tasks group by lineage_root. Top-level tasks render as group headers; child tasks (workers, verifiers, integrators a planner spawned from a parent) indent under them. Each row carries:

ElementMeaning
State badgerunning (green), completed (blue), failed (red), pending (amber), waiting (purple, blocked on a question/approval), unknown (gray, no recent digest).
Role chipplanner, worker, verifier, integrator, reducer, loop_controller. Color-coded so you can scan a deep lineage in one pass.
Pulse dotPulses for ~30s after the planner pushed an update for that task. Lets you tell "moving" from "stalled" without watching timestamps.
Relative time"updated 3s ago" — surfaces stuck planners. If the whole group goes quiet for more than 30s the digest is stale and you should check the planner.
Inline alertsPending approval, pending question, and error rows render as colored callouts inside the task they belong to, with the action buttons attached.

4. Steer Mid-Flight

Four controls show up inside task rows when relevant. Each one queues a command at POST /api/v1/orchestration/commands; the planner consumes it within ~2s.

ButtonWhen it appearsWhat the planner does
ApproveTask is paused awaiting dispatch approval (pending_approval=true in the digest).Calls orch.ApproveTask — the task transitions out of approval gate and dispatches.
RejectSame condition as Approve.Calls orch.RejectTask with the reason you typed (defaults to "rejected from dashboard").
AnswerThe worker called ask_question — the digest carries a pending_question.Calls orch.Respond(task_id, your_answer); the worker resumes with your text in the response payload.
AbortAlways available on non-terminal tasks.Calls orch.CancelTask with the reason; in-flight worker calls receive a cancel signal at the next checkpoint.

The same control plane is exposed at the API level if you want to script approvals from CI:

curl -X POST https://nfltr.xyz/api/v1/orchestration/commands \
  -H "X-Api-Key: $NFLTR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"task_id":"dash-abc123…","type":"approve_dispatch"}'

Use Cases

Use case 1 — Repo bug fix from the dashboard

The fastest path to value. Start a worker on an isolated clone of your repo with --git-code-result, then click + Start with an objective like "Add a regression test for the bug in #1234 and a minimal patch; return a git patch artifact." The worker turns up in the live view, runs to completion, and the planner returns a patch path you can apply in your coordinator clone:

git -C /path/to/coordinator apply < /path/to/worker/clone/$(jq -r '.result.artifacts[0].path' < result.json)

Deeper walkthrough: Local Repo Improvement with NFLTR Workers.

Use case 2 — Plan → implement → verify → integrate

A planner can spawn child tasks under one lineage_root for a multi-actor flow:

  1. Planner decomposes the objective and dispatches implementer workers, one per slice.
  2. As implementers complete, the planner dispatches a verifier against each returned patch.
  3. If verification passes, an integrator task merges the slices into the coordinator clone.
  4. You watch the entire lineage in one collapsible group. Failed verifications show red badges and the failed patch's path; rejecting from the dashboard re-queues the slice with feedback.

Use case 3 — Approval-gated dispatch for risky operations

When you launch a task with the require_approval constraint set in the planner's policy, it parks at pending_approval=true instead of dispatching. The dashboard renders an Approve / Reject alert inline. This is how teams gate deploys, schema migrations, or anything that should not run automatically just because a planner inferred it.

Use case 4 — Mid-flight question and answer

Inside a worker's MCP session, calling ask_question suspends the task with a pending_question. The dashboard shows the question text and a small textarea. Type a one-line answer, click Answer, and the worker resumes. Useful for "Should I prefer option A or B?" decisions you do not want to encode upfront.

Use case 5 — Abort a runaway task

When a worker is stuck in a loop or burning tokens on the wrong objective, Abort is the off switch. The reason you type lands in the orchestrator's history alongside the cancel event so future replays know why the run ended.

Use case 6 — Review a completed task

Terminal-state tasks (completed or failed) get an inline Approve / Request changes / Reject row. Click any of them and the digest row's review_state + reviewer are stamped immediately, with a review event appended to the task timeline. Idempotent — clicking twice converges. Use this when a verifier loop hands a worker's result back to a human for sign-off.

Use case 7 — Rerun a task

The ↻ Rerun button on terminal tasks dispatches a fresh task with the same objective + worker + execution role and a parent_task_id that links it back to the source. The dashboard's plan-tree view nests the rerun under the original so multi-attempt orchestrations stay legible. This is fork-from-N=0: the new task starts clean rather than resuming mid-conversation.

Use case 8 — Per-task workspace isolation (worker-side opt-in)

Run the worker with --per-task-worktree and the worker materialises a fresh git worktree under <repo>/.nfltr/worktrees/<task-id> per task, on a task/<id> branch. The MCP subprocess sees its WorkDir rewritten to the worktree, so an LLM that misuses rm can only damage its own scratch space. Cleanup runs on task finalisation. Pair with --clone-cache-dir <path> to skip the cold clone after the first task against the same (repo, branch) pair.


Behind The Scenes

EndpointDirectionPurpose
POST /api/v1/orchestration/digestplanner → relayPush the planner's current task list (~5s cadence). Stores by authenticated principal.
GET /api/v1/orchestration/tasksdashboard → relayRead tasks scoped to the dashboard user (their planners + sibling agents).
POST /api/v1/orchestration/commandsdashboard → relayEnqueue a command (start_task, approve_dispatch, reject_dispatch, answer_question, abort_task).
GET /api/v1/orchestration/commandsplanner → relayLong-poll for queued commands targeted at this planner (~2s cadence).
POST /api/v1/orchestration/commands/{id}/ackplanner → relayMark a command consumed; the relay drops it from the queue.
POST /api/v1/orchestration/digest/reviewdashboard → relayStamp review_state + reviewer on a digest row. Body: {task_id, action: approve|reject|request_changes|reset, planner_id?}. The relay verifies the row's planner_id is in the principal's owned set.

Worker safety-rail flags (opt-in)

FlagEffect
--per-task-worktree (env: NFLTR_WORKER_PER_TASK_WORKTREE)For each task with a workspace context, materialise a fresh git worktree on a task/<task-id> branch and rewrite the MCP WorkDir to it. Cleanup on task finalisation.
--clone-cache-dir <path> (env: NFLTR_WORKER_CLONE_CACHE_DIR)Cache primary checkouts under <path>/<cache-key> per (clone-url, branch). First task pays the clone cost; subsequent tasks fetch + reset only.

Storage is pluggable. The hosted relay at nfltr.xyz uses a SQL store so commands and digests survive a restart. The same five-method Store interface keeps the wire shape consistent for planners and workers.


Troubleshooting

SymptomLikely causeAction
My Active Orchestrations stays empty after starting nfltr mcp.Planner cannot reach the relay, or the API key is for a different account.Curl https://nfltr.xyz/healthz from the planner host. Confirm the dashboard email matches the API key's owner under Settings → API Keys.
Pulse dots have stopped, but tasks still show running.Planner stopped pushing digests but tasks remain in the relay's last snapshot.Restart nfltr mcp; the next digest tick refreshes the view. Stale entries time out on the relay side after ~5 minutes.
+ Start reports "Pick a worker first" but the dropdown is empty.No agents are registered under your account, or the worker process exited.Run nfltr worker with the same API key. The dashboard refreshes the dropdown each time you re-open the form.
Approve / Answer button click toasts "Session expired".Browser session lapsed.Reload the page and re-auth via Google.
Command queued but task did not move.Planner consumed the command but the orchestrator rejected it (e.g., task already terminal).Check the planner log for orchestration command execution failed warnings — they include the command id and underlying error.
Review button toasts "task not found within owned planners".The dashboard session belongs to a different account than the planner that published the task, or the digest row TTL'd out.Confirm the dashboard email matches the planner's API key under Settings → API Keys. Terminal-state rows survive 7 days; live rows time out after 5 minutes of no refresh.
Worktree dir from a crashed prior task blocks rerun.The worker died before finish() fired its workspace cleanup.The next task with the same id reuses the path after a clean teardown. To force-evict, remove <repo>/.nfltr/worktrees/<task-id> manually, then git worktree prune in the parent repo.

Next