Harness design

Trust is engineered, not assumed.

A harness is everything wrapped around a powerful, fallible actor — an LLM, a deploy script, a migration — that makes its power safe to use. I design harnesses the way other people design features. It's the most transferable thing I do.

rollback on failed validation — automatic, not optional refused → a human decides ACTOR PROPOSE DETERMINISTIC GATE SCOPED + LOGGED EXECUTE AGAINST REALITY VALIDATE DONE

The shape every harness reduces to. The actor can be brilliant or buggy — the loop doesn't care.

The catalog

Six harnesses I've built and operate.

HRN-01 · agent harness

The LLM never acts alone

My ops agent proposes; typed capability boundaries, pre-flight checks, and human approval gates dispose. Every proposal carries a blast-radius estimate computed from a deterministic dependency graph — not from the model's confidence.

HRN-02 · shadow harness

Autonomy is earned, not granted

New automation runs in shadow mode: it says what it would do, and gets graded against what a human actually did. Only measured accuracy above threshold unlocks the execute path — revocable by the same metric.

HRN-03 · deploy harness

Stage, then commit

Compile gates, atomic transfer, remote build under a supervisor that survives disconnects, then an end-to-end smoke test through the real front door. A deploy without its smoke test didn't happen.

HRN-04 · incident harness

Failures become rules

Every incident gets a written root-cause analysis with the full cascade, trigger to symptom. Recurrence converts the RCA into a permanent fix — an alert, a governor, an operating rule.

HRN-05 · verification harness

Read the file, not the "OK"

Mutating scripts must verify by reading back what they changed — a regex that didn't match still prints success. Learned at full price; now baked into every tool I ship.

HRN-06 · the forever loop● next

Any problem → ticket → plan → window → reviewed change → validated outcome → repeat

The end-state the whole system converges toward. Security findings were just the first input source; the loop doesn't care where problems come from — only that every one leaves a record and a fix behind.

Operating rules

Written in incident blood.

Blast radius first

Before any change: what depends on this, what breaks, what's the rollback? One host at a time. Backups before, not after.

Find the cause under the cause

The disk that filled is the trigger; the pipeline that leaks files for weeks is the cause. Fixes aimed at triggers buy days. Causes buy years.

Pin what holds state

Databases and stateful apps ride pinned versions, upgraded deliberately with backups — never dragged forward by a :latest tag at 2 a.m.

Everything becomes a record

Code through merge requests, incidents through tickets, lessons through written rules. If it isn't recorded, it will be re-learned at full price.

Have automation you don't quite trust?

That's my favorite problem