artlu's Bear Blog

OpenClaw Boils Oceans Unnecessarily

OpenClaw is a fascinating and valuable exploration of what's possible when you let LLMs take primacy: to code, to orchestrate, to act.

When it was still a small experiment, scale considerations were correctly sent to the back of the mind.

Now that it has hundreds of millions of script kiddie-style deployments, some of these decisions have consequences. We experience these now: ubiquitous 429/529 errors across all major inference providers.


How

At the scale of OpenClaw and variants, a temporary server overload can lead to self-amplifying waves of cascading failures across the entire decentralized network of inference providers (often sharing AWS or Google Compute resources).

Like a lot of people, I have tiny middleware glue LLM calls in my orchestration flow. They are mostly deterministic, but there is enough value1 in having agentic intelligence that this routing logic lives in an LLM rather than in Python/Typescript/Go/Rust.

For example:

Each LLM that receives a prompt also does a little bit of thinking:

Each model call is probably wrapped in mindless retry logic, e.g., exponential backoff with max N retries.

At production scale, each provider might be wrapped in monitoring that re-routes to a fallback on repeated failures.


We know how to address cascades

1. jitter

Stochasticity is a double-edged sword. It can start small problems that coalesce into a harmonic wave of coordinated retries. It can also break up the synchronicity, which allows the responding system room to breathe.

Set-and-forget overnight Ralph loops are horrible in this sense. AI labs provide discounts on nights + weekends, and train users to burn tokens with lighter monitoring.

2. differentiated error handling

Yeah right, when have you ever seen lazy Claude human programmers write precise error handling code when a handwavy approach appears to work just as well?

Bosses yell at you for taking too long, while ole Jìan-Yáng has already filed 10 tickets as you faff about maintainable code. Places that really need reliability (e.g., NASA) hire for specific problem-solving personality types that differ from the general population (which produced most of the training data for coding agents).

Agentic code with high reliability requirements must be re-written by agents instructed to code defensively against failure modes

3. brownouts/blackouts

When utility providers can't use price spikes to shape demand, they allocate scarce resources using more "fair" heuristics.

4. lying

Anthropic does this. Unlike Fireworks, which gives 429 errors, Anthropic quietly downscales its compute intensity, so your process assumes it got a "correct" result but it actually received a less-reliable guess. It's on you to live with the consequences.

Most people see shitty outcomes, turn off their computers and go touch grass. That's an indirect way to reduce demand!

5. BETTER PRICE SIGNALS -> hard engineering work

The people who need to make their workflows more efficient will spend the resources to do so. The ones who don't, won't.

A benefit of using software built within a large, open-source community, such as OpenClaw and Hermes, is you get economies of scale for process improvements. Fix once, deploy millions of places.

Open competition incentivizes people to build and discover cheaper, more reliable alternatives.


A Tragedy of Commons

Peter Steinberger's very human interest to explore the boundaries of LLMs has led to great individual success for him. It has indirectly led to humanity boiling the oceans via Shenzhen granny armies running shrimp farms (also very human).


More agency makes this worse btw

A less agentic actor, upon encountering a hurdle, just stops.

it didn't work lol

Meanwhile, our magical and powerful agents continue to attempt to please us, often surprising us with their capabilities. They apply obscure knowledge and problem-solve in new domains, under constant reinforcement to deliver outcomes. Their grit and capability are the entire point.

I gotchu



  1. long tail of edge cases / failure modes; more natural for this logic to be described ambiguously; it appears cheaper to shift logic to run later inside millions of GPUs than inside a coding brain upfront; the point of technology is "magic"; ship fast and don't lose build momentum!