OpenClaw Boils Oceans Unnecessarily
OpenClaw is a fascinating and valuable exploration of what's possible when you let LLMs take primacy: to code, to orchestrate, to act.
When it was still a small experiment, scale considerations were correctly sent to the back of the mind.
Now that it has hundreds of millions of script kiddie-style deployments, some of these decisions have consequences. We experience these now: ubiquitous 429/529 errors across all major inference providers.
How
At the scale of OpenClaw and variants, a temporary server overload can lead to self-amplifying waves of cascading failures across the entire decentralized network of inference providers (often sharing AWS or Google Compute resources).
Like a lot of people, I have tiny middleware glue LLM calls in my orchestration flow. They are mostly deterministic, but there is enough value1 in having agentic intelligence that this routing logic lives in an LLM rather than in Python/Typescript/Go/Rust.
For example:
- (called on each heartbeat) "check time, check scheduler, fire off all outstanding tasks"
- "assess text prompt, break it down into bite-sized tasks, identify and route to appropriate workers/models"
- "assess user conversation, do quick memory scan, proactively inject relevant memory into context"
Each LLM that receives a prompt also does a little bit of thinking:
- "hmm, the user appears to be asking me to do X"
- "I should first do a quick confirmation that resources exist which are required to do (my understanding of) X"
- "do not waste effort if resources are unavailable". Try an alternative path to deliver what the user is asking. If that fails, ponder whether my understanding of X could have been mistaken, and try a likely alternative. Ask the user for guidance when it makes sense (but I have been trained away from asking the user too often, as that delivers a less agentic experience)
Each model call is probably wrapped in mindless retry logic, e.g., exponential backoff with max N retries.
At production scale, each provider might be wrapped in monitoring that re-routes to a fallback on repeated failures.
We know how to address cascades
1. jitter
Stochasticity is a double-edged sword. It can start small problems that coalesce into a harmonic wave of coordinated retries. It can also break up the synchronicity, which allows the responding system room to breathe.
Set-and-forget overnight Ralph loops are horrible in this sense. AI labs provide discounts on nights + weekends, and train users to burn tokens with lighter monitoring.
2. differentiated error handling
Yeah right, when have you ever seen lazy Claude human programmers write precise error handling code when a handwavy approach appears to work just as well?
Bosses yell at you for taking too long, while ole Jìan-Yáng has already filed 10 tickets as you faff about maintainable code. Places that really need reliability (e.g., NASA) hire for specific problem-solving personality types that differ from the general population (which produced most of the training data for coding agents).
Agentic code with high reliability requirements must be re-written by agents instructed to code defensively against failure modes
3. brownouts/blackouts
When utility providers can't use price spikes to shape demand, they allocate scarce resources using more "fair" heuristics.
4. lying
Anthropic does this. Unlike Fireworks, which gives 429 errors, Anthropic quietly downscales its compute intensity, so your process assumes it got a "correct" result but it actually received a less-reliable guess. It's on you to live with the consequences.
Most people see shitty outcomes, turn off their computers and go touch grass. That's an indirect way to reduce demand!
5. BETTER PRICE SIGNALS -> hard engineering work
The people who need to make their workflows more efficient will spend the resources to do so. The ones who don't, won't.
A benefit of using software built within a large, open-source community, such as OpenClaw and Hermes, is you get economies of scale for process improvements. Fix once, deploy millions of places.
Open competition incentivizes people to build and discover cheaper, more reliable alternatives.
A Tragedy of Commons
Peter Steinberger's very human interest to explore the boundaries of LLMs has led to great individual success for him. It has indirectly led to humanity boiling the oceans via Shenzhen granny armies running shrimp farms (also very human).
More agency makes this worse btw
A less agentic actor, upon encountering a hurdle, just stops.
Meanwhile, our magical and powerful agents continue to attempt to please us, often surprising us with their capabilities. They apply obscure knowledge and problem-solve in new domains, under constant reinforcement to deliver outcomes. Their grit and capability are the entire point.
long tail of edge cases / failure modes; more natural for this logic to be described ambiguously; it appears cheaper to shift logic to run later inside millions of GPUs than inside a coding brain upfront; the point of technology is "magic"; ship fast and don't lose build momentum!↩