An Expensive Little Snafu
I accidentally burned a couple hundred dollars for inference before finding the leak.
Had asked my agent harness to ingest a single large Markdown file. The LLM call never completed successfully. I gave up, found another approach, and moved on. The harness kept seeing the failed artifact, and kept retrying in the background.
My routing is set up as: use all daily resetting credits1 first, then use more expensive pay-as-you-go as a backup. Never have downtime.
Each day, retries burned through daily credits in about half an hour, then continued to bill for failed jobs for the rest of the 23.5 hours.
This continued for about a week, before I caught on.
My pay-as-you-go provider2 encourages users to set a higher billing limit. When they get congested, they cut service to those on lower billing limits first.
Of course I prefer my agent to keep trying stuff (until I allow it to stop)
Of course I prefer my provider to give me higher uptime
Fails are more costly, though
Cost vigilance is ironically required for a robust setup with resilient fallbacks