· 6 min read

Designing Multi-Agent Architectures That Don't Collapse

Role boundaries, escalation chains, and separation of concerns — patterns we learned running 6 AI agents in production.

agents architecture openclaw multi-agent

Six agents. One orchestrator. A content pipeline that runs while we sleep.

That’s what we built at Jorsby. And the first version was chaos.

Agents stepped on each other. The orchestrator did everyone’s job. Communication was a free-for-all. Reviewing output was impossible. We rebuilt the architecture three times before landing on patterns that actually hold up under daily use.

Here’s what we learned.

The manager agent shouldn’t write code

Our main orchestrator, Jorsby, coordinates the other five agents. It decides what content to create, delegates work, manages the publishing schedule. It’s the manager.

One day, we found it had edited the dashboard code directly. A “quick fix” to a display bug. The code worked — Jorsby is capable — but it violated the entire architecture. The manager was doing the engineer’s job.

This is a real failure mode with agent systems. LLMs are general-purpose. If your orchestrator agent has access to code files and it sees a bug, it’ll fix it. Not because it’s ignoring your rules — because fixing the bug is, from its perspective, the most efficient path to completing the task.

We fixed it with a hard rule: the main agent cannot edit files in ~/Development/. If something needs code changes, it spawns the dev agent. No exceptions.

# In the orchestrator's workspace config
restricted_paths:
  - ~/Development/**
  - ~/Projects/**

# If code needs changing:
# 1. Orchestrator describes the problem
# 2. Spawns dev agent with the task
# 3. Dev agent makes changes
# 4. Dev agent reports back with proof

The principle: a manager who does the engineer’s job is a bad manager and a bad engineer. The manager lacks the full context the engineer has. The engineer doesn’t get to build their understanding of the system. Both roles suffer.

This applies to any multi-agent system. Define what each agent can and can’t do, then enforce those boundaries. Not with instructions — with actual access controls. (We covered this in our first post — constraints always beat instructions.)

Separate creation from publishing

Our content pipeline has two distinct phases: creation and publishing. Initially, we ran both through the orchestrator.

Content agents would generate drafts, send them to the orchestrator, the orchestrator would review and dispatch to platforms. This created a massive bottleneck. The orchestrator became a single point of failure and a throughput ceiling. Six agents producing content, one agent dispatching it.

The fix was separating the phases but distributing execution:

Content Agent → generates draft

Discord Review Channel → human reviews

Human approves (reaction or command)

Content Agent → dispatches to platforms directly

The approval gate still exists — that’s non-negotiable. We’re not letting agents post to social media without human review. But once approved, the content agent that created the draft also handles publishing. It already has the full context: the image, the captions, the platform targets, the hashtags.

The key insight: the approval gate and the execution step don’t need to be in the same agent. You want centralized review and distributed execution. The bottleneck should be the human decision, not an agent relay.

Escalation chains over flat communication

With six agents, the naive approach is letting everyone talk to everyone. Agent A needs something from Agent C? Send a message. Agent D has a question? Ask whoever seems relevant.

This doesn’t work. Six agents with full bidirectional communication means 30 possible channels. Nobody knows who’s responsible for what. Messages get duplicated. Decisions get made in the wrong place.

We replaced it with a strict escalation chain:

Content Agents (EN, TR, AR)
       ↓ escalate to
   TechOps Agent
    ↓           ↓
Dev Agent    Orchestrator
(code)       (strategy)

TechOps Agent (verifies)

Each agent has exactly one escalation target. Content agents escalate to TechOps. TechOps escalates to Dev for code changes or to the Orchestrator for strategic decisions. Dev reports back to TechOps for verification.

This does three things:

  1. No decision fatigue. An agent never has to figure out who to ask. It has one target.
  2. Clear accountability. If something goes wrong in publishing, the chain of decisions is traceable.
  3. Appropriate context. TechOps understands both the content side and the technical side. It’s the right layer to triage whether something is a code issue or a strategy issue.

The military has been doing this for centuries. You report to your direct superior, not to the general. It works because it limits the blast radius of any single communication failure.

No proof, not done

This one is simple but it changed our reliability overnight.

Every engineering task — whether done by a human or an agent — must produce proof of completion. Not “I fixed it.” Not “It should work now.” Actual evidence.

What counts as proof:

  • Screenshot of the working feature
  • curl output showing the API returns the right response
  • Test results from the actual test suite
  • Before/after diff showing what changed and why
  • Log output demonstrating the fix in action
# Bad: agent reports completion
"Fixed the image overlay bug. The issue was in the positioning logic."

# Good: agent reports completion with proof
"Fixed the image overlay bug. The issue was negative margin calculation
in overlay.py:142.

Before: text rendered at y=-20 (off-screen)
After: text renders at y=80 (visible, in safe zone)

Test output:
✓ overlay_positioning_test — 12/12 passed
✓ arabic_rtl_test — 8/8 passed

Screenshot attached: [overlay_fix_proof.png]"

We enforced this as a rule in agent workflows: the dev agent’s task isn’t complete until proof is attached to the report. TechOps verifies the proof before marking the task done.

This catches the most dangerous failure mode in agent systems: confident incorrectness. An LLM will tell you it fixed the bug with absolute certainty even when the fix doesn’t work. Proof is the only antidote.

Putting it together

These patterns reinforce each other. Role boundaries prevent agents from stepping on each other. Escalation chains prevent communication chaos. Separation of creation and publishing prevents bottlenecks. Proof requirements prevent false completion.

The architecture we landed on:

┌──────────────────────────────────────────────┐
│              Orchestrator (Jorsby)            │
│         Strategy, scheduling, delegation     │
│         CANNOT: edit code, post directly     │
└──────────────┬───────────────────────────────┘

       ┌───────┴───────┐
       ▼               ▼
┌─────────────┐  ┌─────────────┐
│   TechOps   │  │  Dev Agent  │
│  Operations │←→│    Code     │
│  Monitoring │  │   Changes   │
│  Triage     │  │   + Proof   │
└──────┬──────┘  └─────────────┘

  ┌────┼────┐
  ▼    ▼    ▼
┌───┐┌───┐┌───┐
│ EN││ TR││ AR│  Content Agents
│   ││   ││   │  Create → Review → Publish
└───┘└───┘└───┘

None of this is revolutionary. It’s basic organizational design, applied to AI agents. But that’s the point — the same patterns that make human teams work also make agent teams work. Clear roles, clear communication lines, clear expectations, and verified output.

The takeaway

If your multi-agent system feels chaotic, it’s probably an architecture problem, not an AI problem. Agents are capable. The challenge is giving them the right structure.

Define roles. Enforce boundaries. Limit communication to clear escalation chains. Separate creation from publishing. Require proof of every completed task.

Build the org chart before you build the agents.