Hermes Iteration Budget Explanation | Generated by AI
Here’s how the iteration budget works in Hermes:
Iteration Budget — How It Works
The core class is agent/iteration_budget.py — a thread-safe consume/refund counter (~62 lines). Each AIAgent instance (parent or subagent) holds one.
Defaults:
- Parent agent:
max_iterations=90(configurable) - Subagents (via
delegate_task):delegation.max_iterations(default 50), independent budget
The loop in agent/conversation_loop.py line 796:
while (api_call_count < agent.max_iterations
and agent.iteration_budget.remaining > 0) \
or agent._budget_grace_call:
Each iteration = one API call (which may include multiple tool calls in that turn).
Budget flow per iteration:
- Check interrupt — if user sent
/stopor new message, break immediately - Grace call check — if
_budget_grace_callis set, consume the flag and let this one last iteration run (then exit regardless) consume()— tries to deduct 1 from remaining. If already at 0, prints the ⚠️ warning you saw and breaks- Run API call — model responds, tools execute
- Refund on certain conditions:
execute_codeiterations get refunded (line 3403, 3883) — programmatic tool calls don’t eat your budget- Context compression restarts get refunded (line 3403) — retrying with compressed context doesn’t count
- Ollama context-too-small errors get refunded (line 1100)
When budget is fully exhausted (line 4332-4349):
- If no final response yet, Hermes strips all tools and makes ONE extra call asking the model to summarize what it’s done so far
- This is the “grace call” — the model gets a chance to wrap up cleanly instead of just dying mid-tool-use
The message you saw:
⚠ Iteration budget reached (60/60) — response may be incomplete
This means the agent hit max_iterations=60 (either configured in your config.yaml or set by the caller). The response was the model’s summary attempt after budget exhaustion.
Key design decisions:
- Budget is per-agent, not global — a subagent burning 50 iterations doesn’t消耗 the parent’s budget
- Thread-safe via
threading.Lock— safe for concurrent gateway sessions - Refunds prevent
execute_codeloops from starving the main conversation - The grace call ensures you always get some response, even if truncated