Field work

Things I've actually shipped.

Everything below runs — or ran — on real infrastructure with real consequences. Filter by what interests you; the full case studies stack below.

Case studies

The longer stories.

PRJ-001 · live in production

The AIOps Kernel — a lab that heals itself

Alerts from Prometheus/Alertmanager flow through a workflow engine into an ERP-grade ticketing system, deduplicated by fingerprint so a flapping alert never becomes eighty tickets. A planning layer consolidates open findings — at one point 216 tickets — into 16 sequenced, dependency-aware maintenance plans that respect cluster quorum, gateway, and domain-controller ordering.

The repair side runs in shadow mode by default: proposed actions are graded against reality, and autonomy is only granted when measured accuracy earns it. The first operator-approved automated repair — a CVE patch on a cluster node — has already shipped. Rule-tuning is part of the system, too: one mis-tuned alert rule was traced as the source of 85 false-positive tickets and fixed at the rule, not the symptom.

PRJ-002 · active build

Hermes — an agent brain with its hands tied properly

Most "AI agents" put a language model directly in the action path and hope. Hermes inverts that: the LLM proposes, deterministic machinery disposes. Every action passes through typed capability boundaries, pre-flight checks, and approve → execute → rollback semantics. It consumes tickets from the ops loop, reasons over the lab world-model, and attaches a blast-radius estimate to every proposal.

When a polished open-source agent appeared that looked similar, I did the comparison honestly: harvested its fourteen deterministic safety modules for ideas, and kept my inverse-design architecture — because an LLM-in-the-action-path design can't be patched into trustworthiness.

PRJ-003 · boots on hardware

Ophanim — full Docker on a consumer phone

Android ships without the kernel namespaces and cgroup configuration Docker needs, and actively fights you on the rest. I built a custom kernel from source — clang, careful config surgery on a vendor 4.14 tree — then solved the chain the platform throws at you: mount propagation that breaks pivot_root, cgroup layouts dockerd doesn't expect, a libc that lies. The result: dockerd running natively on a flagship phone, serving a containerized HTTP workload.

It's now growing a management layer — an on-device container store, lifecycle manager, and a custom metrics daemon that reads /dev/memcg directly, because Android's stats APIs return zeros. Equal parts systems engineering and stubbornness.

PRJ-004 · phase 0 proven

Lab World-Model — blast radius in few tokens

AI agents are expensive and slow when they rediscover infrastructure by SSH every time. The world-model is a deterministic dependency graph — 16 hosts and 678 services inventoried from the SIEM's data spine, overlaid with hand-recorded operational hazards — plus a retriever that answers "what breaks if this goes down?" without touching a single host. Phase 0 proved the inventory with zero SSH connections.

PRJ-005 · design complete

Janus — identity, rebuilt cloud-native

A ground-up replacement for an Active Directory domain: modern SSO and directory (Authentik) plus authoritative DNS (PowerDNS), designed dual-site active-active on Kubernetes so identity survives the loss of either site. The migration is staged with rollback at every phase — because identity is the one system you don't get to break. Designed epic-first: eight tracked workstreams before a single manifest was applied.

PRJ-006 · live

Secure GPU gateways — private compute, public reach

Two hardened single-purpose gateways publish private GPU services to the internet without trusting it: an LLM inference endpoint and a security-research rig. The pattern: mutual TLS with a per-service internal CA, Argon2id-hashed bearer credentials, Cloudflare Zero Trust service tokens at the edge, and a deploy harness with compile gates and end-to-end smoke tests. One audit found an old route exposing the raw, unauthenticated inference API — found it, killed it, wrote it down.

PRJ-007 · live

Broadcast-grade media engineering

A fully automated media platform: self-healing acquisition pipelines with import watchdogs (the failure mode that once filled a 492 GB volume now has a governor), GPU-accelerated transcoding, artwork pipelines, and a faithful recreation of early-2000s cable television — daypart-accurate schedules, era-correct commercial breaks — streamed as live TV. Plus a custom visual discovery app for the "nothing to watch" problem.

PRJ-008 · phase 1 done

Maintenance Planner — from chaos to calendar

The bridge between "the system found 216 problems" and "here is this month's maintenance calendar." The planner classifies findings per host, decides reboot-then-upgrade versus targeted patching, sequences work so cluster quorum and the gateway never go down together, and books the windows into a shared calendar. Cybersecurity findings are just its first input source.

PRJ-009 · you're in it

This website — the stack, demonstrated

Hand-written HTML, CSS, and JavaScript — no framework, no template, no tracker. A custom WebGL background, served by nginx in a container on my own cluster, published through a Cloudflare tunnel with zero inbound ports, deployed from a GitLab repo via merge request. The contact form files straight into my self-hosted ERP's helpdesk. The site is small; the plumbing is the résumé.

Want the full story on any of these?

Ask me directly