An Expensive Little Snafu

15 Jun, 2026

I accidentally burned a couple hundred dollars for inference before finding the leak.

Had asked my agent harness to ingest a single large Markdown file. The LLM call never completed successfully. I gave up, found another approach, and moved on. The harness kept seeing the failed artifact, and kept retrying in the background.

My routing is set up as: use all daily resetting credits¹ first, then use more expensive pay-as-you-go as a backup. Never have downtime.

Each day, retries burned through daily credits in about half an hour, then continued to bill for failed jobs for the rest of the 23.5 hours.

This continued for about a week, before I caught on.

My pay-as-you-go provider² encourages users to set a higher billing limit. When they get congested, they cut service to those on lower billing limits first.

Of course I prefer my agent to keep trying stuff (until I allow it to stop)

Of course I prefer my provider to give me higher uptime

Fails are more costly, though

Cost vigilance is ironically required for a robust setup with resilient fallbacks

Venice.ai via staked $DIEM↩
Fireworks↩