/dev

Projects and technical write-ups.

Spec engineering for AI-assisted delivery

May 19, 2026

Five-document /specs pipeline that sits between an idea and a coding agent. Slower at the front, cheaper everywhere after. Each document answers one question and refuses the others — that refusal is the whole trick.
Hard rails for an autonomous code-audit agent

May 6, 2026

An autonomous audit agent runs across my repos every 48 hours — finds bugs, fixes safe ones in a draft PR, reports the rest. The most important part of the prompt isn't what to look for; it's the list of things it must never do.
Two bottlenecks that killed my capture pipeline

May 6, 2026

ScreenSearch sat at 20% CPU at idle for weeks. Two specific things were doing all the work: synchronous FTS5 triggers inside the capture transaction, and OCR running on every frame regardless of change. Pulling each out cut CPU in half; together they got idle down to ~3%, same hardware.
Walking Authentik through ten major versions

Apr 30, 2026

A self-hosted Authentik instance pinned to August 2024, eighteen months and ten major versions behind, with an upstream that forbids skipping. The migration plan that walked it through all ten hops in an afternoon — including moving custom CSS from a bind-mount to a native DB field — without anyone losing an SSO session.
Summarizing thirty minutes of audio on a phone

Apr 29, 2026

An Android meeting-recorder that does transcription, diarisation, and summarisation entirely on-device — no audio leaves the phone. The five-stage map-reduce pipeline that summarised thirty-plus minutes of conversation through a 1B LLM with a 4k context, the prompts that compress without lying, and the thing that broke at chunk 14.
Shipping an embedded LLM with a desktop app

Apr 22, 2026

ScreenSearch ships a 3B LLM that runs entirely on the user's machine — no keys, no telemetry, no trust-us. The lifecycle pattern that made it reliable: lazy start, idle TTL, port fallback, crash recovery, Vulkan acceleration. About 600 lines of Rust around llama-server.
Splitting Immich across two boxes

Apr 17, 2026

Immich on a low-power N100 is great until you ask it to face-recognise 85,000 photos. My split keeps the always-on box serving UI and database while an old GPU box only wakes up for ML work — with an nginx shim that hides the seam when the GPU box is asleep.
Building ScreenSearch

Apr 8, 2026

A tour of ScreenSearch: a privacy-first Windows 'memory' written in Rust. Continuous capture, OCR on changed regions, FTS5 in SQLite, and an embedded 3B LLM for cross-day queries. Nothing leaves the machine. The architecture, the decisions reversed, the parts thrown away.
Wrapping Komodo in an MCP server

Mar 25, 2026

An MCP server in TypeScript that exposes the Komodo container-manager API to Claude Code: 15 read tools, 12 execute tools, 8 write tools. The interesting part isn't the API wrapping — it's the partitioning, and the deliberate friction between an agent and the buttons that can take production down.
Atomic rate-limiting on the Cloudflare free tier

Mar 11, 2026

TrackerSync's old KV-backed rate-limiter let abuse through under concurrency. Replacing it with an atomic D1 + Durable Objects design — keeping KV only for graceful degradation — fixed the race without leaving the Cloudflare free plan.
A VPN kill-switch that actually kills

Mar 10, 2026

Binding qBittorrent to a VPN interface is a kill-switch that usually works. Putting it inside the VPN container's network namespace is a kill-switch that always works. The one-line docker-compose pattern, and why it beats the Reddit-standard approach.
Cloudflare Tunnel as a homelab front door

Mar 4, 2026

Twenty self-hosted services reachable from anywhere, zero inbound ports open. One cloudflared container, one outbound tunnel, one shared Docker bridge any stack can join in a line. The architectural choice that turns 'Cloudflare Tunnel works' into 'Cloudflare Tunnel scales.'
Killing KV in the hot path

Mar 4, 2026

In early 2026, every TrackerSync request was spending 6–9 KV operations before doing real work — enough to burn through the Cloudflare free tier in hours. Moving rate-limiting to a Durable Object and metadata to D1 cut KV out of the happy path entirely.
Three ways to make Qwen3 stop thinking out loud

Feb 25, 2026

Qwen3 emits <think> blocks before every answer. Useful for researchers, expensive for tools. Three places you can install an off-switch — prompt, parameters, Modelfile — and only one survives every transport. Which one, why, and how the other two will mislead you.
The reasoning tax

Feb 18, 2026

Most coding work doesn't need the most expensive model in the room. Planning needs reasoning. Implementation needs adherence. Debugging needs reasoning again. The tiered setup I run, the reasoning tax I refuse to pay, and the per-session math that made me stop apologising for cheap models.
Making a local Whisper server pretend to be OpenAI

Jan 21, 2026

A 50-line FastAPI shim that makes a local whisper-large-v3 server speak fluent OpenAI on /v1/audio/transcriptions. Enough that Open WebUI, LM Studio, and a pile of scripts written for the real API never notice the difference — plus the compatibility traps that took two evenings to find.