artlu's Bear Blog

Specs Plans Hooks and Skills Are Probably Enough

We are just about there.

I tried a current-best-practices-as-I-understand-them exercise: clone existing api contract (a "protocol") with a well-documented reference implementation, but re-implement from scratch in order to adapt to specific tradeoffs.

Principles:

  1. mountain climbing: stay pointed at goal, but let mountain determine each next step to take. Don't bulldoze the terrain following a pre-determined technical plan
  2. use AI to reduce slop

I used red-green-refactor TDD long-running agents1, day-shift-night-shift task separation, and multiple models to review/implement/coordinate. Fresh codebase. 3 commits, about 48 hours clock time, ~12 hours human time of plan-review-QA-supervision-guiding. Two $20/month coding plan subs (GLM and Cursor).

Looking at the codebase the agents produced, it has a clean layout, real tests, and architectural decisions that earn their keep. The code itself is professional enterprise quality:

My steering mostly involved:

I estimate in a non-tech Fortune 500 company, this would take 1-2 weeks, 20% of a program manager, 1 full-time mid-level developer, and 1-5% of an engineering manager to review the code (no more than 2 hours).

It cost $3 in tokens2 and 12 hours of my time.

Of course, I want to reduce HitM dependency, and also the fairly intense feeling that as soon as I take my hands off the steering wheel, this self-driving vehicle will either drive off the road, or reward me with a little dopamine magic. But it's gotten pretty good!


2024 Fortune 500 2026 agentic development role
8 hrs spec writing -> BRD 2 hrs planning chat -> Markdown scaffolding PM
72 hrs coding @ $40k-200k tc 16 hrs code generation @ ~$500 p.a. IC
2 hrs review @ $125k+ tc 10 hrs monitoring @ founder ramen manager

Note how human tasks are shifted entirely

Here's my project. I posit that the fastest-moving regulated bank could launch something new like this, in production, in no less than 2 weeks. Maybe a superstar team could collapse the execution part into 2 days, but only if they got pre-approval, leveraged existing building blocks, and management could clear the runway for launch3.

I could deploy a mostly-unstoppable ramp to global payment rails, in 2 days and less than $10 (plus ramen).



  1. I did not use multiple agent personalities (special QA agent, special architect persona) as I find these hokey. Maybe this would have helped. I can't tell if it's 80/20 impact-to-kayfabe, or 20/80. If you swear by this approach, I'd love to hear from you!

  2. subscribed at subsidized prices 3 months ago. Maybe $10 market price now

  3. this is all organizational work, that enterprise has decided is a necessary tax on productivity. In contrast, as an individual actor I haven't invented anything or jumped the shark on any regulation. So I don't consider those those control layers useful for society