January 27, 2025
Lessons in Agentic Orchestration and Context Management
Lessons in Agentic Orchestration and Context Management
When I set out to build an AI system that clones and rebrands websites, I quickly realized that the challenge wasn't just about prompting an LLM. The real complexity lay in orchestrating multiple agents to work together coherently while managing the finite resource that makes or breaks any AI application: context.
The Problem with Single-Agent Systems
The naive approach to building an AI-powered website generator is straightforward: give a powerful model all the information it needs and let it work. This works for simple tasks, but falls apart when dealing with multi-page websites. A single agent trying to build ten pages sequentially will:
- ·Exhaust its context window before finishing
- ·Lose coherence as earlier decisions fade from memory
- ·Take forever since each page must wait for the previous one
The answer wasn't to find a bigger context window. It was to rethink the architecture entirely.
The Orchestrator-Worker Pattern
The system uses a hierarchical multi-agent architecture. At the top sits an orchestrator agent responsible for high-level decisions: analyzing the source website, establishing design patterns, and creating shared components. Below it, specialized worker agents handle individual pages in parallel.
This separation of concerns mirrors how human teams work. A lead architect doesn't write every line of code. They establish patterns, create shared foundations, and delegate implementation to team members who work concurrently.
Scoped Autonomy: The Key to Multi-Agent Coordination
Here's a counterintuitive insight: the more you constrain an agent, the better it performs. Unlimited autonomy leads to chaos when multiple agents work on the same codebase.
Worker agents have what I call "scoped autonomy." They can read any file (they need context about shared components), but they can only write to their assigned output file.
This simple constraint prevents:
- ·Agents stepping on each other's work
- ·Race conditions in file modifications
- ·Cascading errors from well-intentioned but misguided "improvements"
Context Management: The Silent Killer
Context windows are expensive. Every token you feed to the model costs money and, more importantly, displaces potentially useful information. In a system where agents write code, context can explode rapidly.
Consider what happens when an agent writes a 200-line React component. That content goes into the conversation history. If the agent writes ten files, you've suddenly consumed thousands of tokens just from your own outputs.
My solution was aggressive content summarization. When a tool writes or edits a file, the actual content doesn't need to persist in full. The agent doesn't need to re-read its own output verbatim; it needs to know the output exists and can reference it if needed.
Token Tracking as a First-Class Concern
Every agent tracks its token usage in real-time. This isn't just for billing purposes. It enables:
- ·Graceful degradation: Agents can adjust behavior when approaching limits
- ·Performance monitoring: Identify which tasks consume disproportionate context
- ·Debugging: Understand why an agent might have "forgotten" earlier instructions
When an orchestrator sees it's at 75% utilization with three more pages to delegate, it can make intelligent decisions about batching or summarization.
Parallel Execution: Not Just for Speed
Running worker agents in parallel isn't just about performance (though a 5-page site building in parallel is significantly faster than sequential). It's about isolation.
When agents run concurrently without sharing mutable state, entire categories of bugs disappear. Each agent operates in its own context, makes its own decisions, and produces its own output. The orchestrator collects results and integrates them, but the workers never directly interact.
This pattern also simplifies error handling. If one worker fails, the others continue. The orchestrator can retry the failed page or report partial success.
The Agentic Loop: Simplicity at the Core
Despite the complexity of multi-agent orchestration, each individual agent runs a remarkably simple loop:
The power comes from composition. The orchestrator's "execute tools" step includes delegating to workers, which run their own loops. This recursive structure handles arbitrary complexity while keeping each component simple.
Cancellation: The Forgotten Feature
Real-world systems need graceful cancellation. Users change their minds. Errors occur. Resources need to be freed.
I implemented cancellation as a first-class concern. At multiple checkpoints in the agentic loop, agents check a cancellation flag:
- ·Before starting a new iteration
- ·After receiving an LLM response
- ·Before executing expensive operations
When cancelled, agents clean up gracefully and emit appropriate events.
Lessons Learned
Building this system taught me that agentic architectures require different thinking than traditional software:
The future of software will increasingly involve systems where multiple AI agents collaborate on complex tasks. The patterns we establish now, around context management, scoped autonomy, and hierarchical orchestration, will shape how we build these systems for years to come.