Engineering · Agent runtime

Agent runtime & delegation

OpenHive's agents do not run on top of a static graph. The org chart you draw on the canvas only constrains who is allowed to call whom; actual delegation happens because the LLM emits a delegate_to(...) toolcall every turn. Most of the runtime's shape — dynamic delegation, child context isolation, parallel-fork cache reuse, tool partitioning — falls out of that single decision.

Why a tool call instead of a graph

The most natural alternative is the LangGraph-style approach: compile the org chart so an edge like lead → writer is baked in at build time. OpenHive deliberately rejects that. Delegation is a per-turn judgement the LLM makes, not information you can know ahead of time. For the same input one turn delegates and another answers directly. One turn fans out to three subordinates in parallel; another sends to just one.

So delegate_to and delegate_parallel are exposed at the same level as every other tool. The org chart only enters as the enum of valid assignee parameters — the canvas guarantees only direct children pass through. How that unified tool surface is assembled from different sources is covered in The tools an AI uses.

The LLM emits delegate_to → the engine validates, records events, spawns the child session, and re-injects the result into the parent's history.

What happens in a single delegation

When the parent agent's LLM produces a tool_use block of delegate_to("writer", prompt, mode), the engine processes it in this order.

01Schema validation. assigneemust be in the enum of the parent node's direct children on the canvas. That is the only constraint the org chart enforces.
02Depth and pair-cap checks. max_delegation_depth and the per-turn max_delegations_per_pair_per_turn stop runaway loops. They prevent the model from calling the same child over and over in one turn.
03delegation_opened event. A line is appended to events.jsonl and the corresponding node on the Run canvas immediately switches to active state. The UI does not watch a separate channel.
04Child session spawn. The child starts with a fresh, isolatedhistory. It does not inherit the parent's conversation — only the prompt the parent explicitly passed in plus the child's AGENT.md system prompt. Context-leak prevention first; cache efficiency second.
05Child turn loop runs. The child runs with its own model and its own toolset. It may delegate further — recursion starts here.
06Result injection.When the child finishes, its output is appended to the parent's history as a tool_resultblock. From the parent model's perspective an ordinary tool call has just returned.
07delegation_closed event. A close event is recorded along with success/failure status and the corresponding edge on the Run canvas deactivates.

Why the child's context is empty

Not handing the parent's full message history to the child is intentional. Three effects follow.

Token cost separation.The parent's accumulated context is paid for only by the parent. A model where each child carries the parent's entire history grows exponentially expensive as delegation deepens.
Role isolation.The writer doesn't need to know which user the lead was talking to. Only the prompt the parent consciously crafted defines the child's task.
Cache friendliness.The child's prefix (system prompt + AGENT.md + tool definitions) is identical across calls. Calling the same child again means a prefix cache hit.

Parallel delegation and fork cache reuse

With delegate_parallel([writer, researcher, verifier]) sending several children in one turn, each child session shares the same parent snapshot at the same instant. For Claude providers the engine exploits this: the same prompt prefix is cache-written once and the sibling children fork on top of that cache. As a result, fanning out work across N siblings keeps prompt-cache cost close to 1×.

Other providers (Codex, Copilot, etc.) have different prefix-cache semantics, so fork.ts falls into the non_claude branch and only context isolation applies.

Many tools in a single turn — how they're executed

The model can emit several tool_use blocks in one assistant turn. Running them all serially is safe but slow; running them all in parallel makes side effects collide. OpenHive uses a classifier called tool partition v2 to put each call into one of four classes, then applies a different concurrency policy per class.

Tool name → class. trajectory and serial_write run serially; parallel_trajectory and safe_parallel run up to a per-class cap.

trajectory (serial). delegate_parallel, ask_user, activate_skill, set_todos / add_todo / complete_todo. These mutate run-scoped state (todos, ask_user inbox, the active skill set) that the next tool in the same batch may read, so they cannot overlap. One at a time, in order.
parallel_trajectory (cross-subordinate parallel). The dedicated class for delegate_to. When children differ, their pair counters, scratch directories, and ledger rows are disjoint, so simultaneous execution is safe. Because each branch spawns an LLM stream, the cap is OPENHIVE_PARALLEL_DELEGATION_MAX — separate from the safe_parallel cap.
serial_write (serial). sql_exec, run_skill_script. Arbitrary Python and team-DB writes are serialised to avoid intra-turn races.
safe_parallel (parallel). sql_query, read_skill_file, the web-fetch / web-search skills, and every mcp__*tool. Side-effect-free or idempotent, so the engine fires up to the class's cap in one go; oversize buckets split into consecutive parallel runs of cap-size each.

When delegate_to(writer) and delegate_to(researcher)appear side by side in one turn, both fire in parallel within the same class (parallel_trajectory). That is exactly what motivated v2 — the v1 rule of "trajectory is always serial" was needlessly costly when different children share no state.

Agent persona: AGENT.md

Each node's identity lives in a single file at packages/agents/{name}/AGENT.md. The runtime reads it and splices the body into the system prompt. The result looks roughly like this.

<system>
[Engine-wide rules — tool-use protocol, delegation guidance, output format]

[AGENT.md body — this node's role, tone, prohibitions, descriptions of its delegates]

[Current team state — direct children, which modes can be called]

[Active skills — bodies of any SKILL.md activated via activate_skill]
</system>

Editing AGENT.md isagent tuning. You're not toggling a config knob; you're rewriting in markdown what kind of person this node is.

Safety caps, in one place

Dynamic delegation is powerful but prone to runaway. The caps the engine enforces are:

max_delegation_depth — depth of the delegation tree. Stops the child-of-a-child-of-a-child... chain from going infinite.
max_delegations_per_pair_per_turn — how many times the same (parent → child) pair can be invoked within a single turn.
per-assignee max_parallel — caps how many concurrent delegations can target the same child. If delegate_parallelemits N>cap calls to one assignee, only cap of them go through.
delegationSatisfiedflag — once a child's result has come back to the parent, the same pair is blocked from being re-delegated within the same turn. This stops the model from calling again before it has even read the result.

Related code

apps/web/lib/server/engine/session.ts — turn loop, delegation tool dispatch, cap enforcement
apps/web/lib/server/engine/fork.ts — prefix-cache fork for parallel delegation
apps/web/lib/server/engine/tool-partition.ts — tool → class classifier
apps/web/lib/server/agents/runtime.ts, packages/agents/*/AGENT.md — agent persona loading