Projects and technical write-ups.
Five-document /specs pipeline that sits between an idea and a coding agent. Slower at the front, cheaper everywhere after. Each document answers one question and refuses the others — that refusal is the whole trick.
An autonomous audit agent runs across my repos every 48 hours — finds bugs, fixes safe ones in a draft PR, reports the rest. The most important part of the prompt isn't what to look for; it's the list of things it must never do.
ScreenSearch sat at 20% CPU at idle for weeks. Two specific things were doing all the work: synchronous FTS5 triggers inside the capture transaction, and OCR running on every frame regardless of change. Pulling each out cut CPU in half; together they got idle down to ~3%, same hardware.
A self-hosted Authentik instance pinned to August 2024, eighteen months and ten major versions behind, with an upstream that forbids skipping. The migration plan that walked it through all ten hops in an afternoon — including moving custom CSS from a bind-mount to a native DB field — without anyone losing an SSO session.
An Android meeting-recorder that does transcription, diarisation, and summarisation entirely on-device — no audio leaves the phone. The five-stage map-reduce pipeline that summarised thirty-plus minutes of conversation through a 1B LLM with a 4k context, the prompts that compress without lying, and the thing that broke at chunk 14.
ScreenSearch ships a 3B LLM that runs entirely on the user's machine — no keys, no telemetry, no trust-us. The lifecycle pattern that made it reliable: lazy start, idle TTL, port fallback, crash recovery, Vulkan acceleration. About 600 lines of Rust around llama-server.
Immich on a low-power N100 is great until you ask it to face-recognise 85,000 photos. My split keeps the always-on box serving UI and database while an old GPU box only wakes up for ML work — with an nginx shim that hides the seam when the GPU box is asleep.
A tour of ScreenSearch: a privacy-first Windows 'memory' written in Rust. Continuous capture, OCR on changed regions, FTS5 in SQLite, and an embedded 3B LLM for cross-day queries. Nothing leaves the machine. The architecture, the decisions reversed, the parts thrown away.
An MCP server in TypeScript that exposes the Komodo container-manager API to Claude Code: 15 read tools, 12 execute tools, 8 write tools. The interesting part isn't the API wrapping — it's the partitioning, and the deliberate friction between an agent and the buttons that can take production down.
TrackerSync's old KV-backed rate-limiter let abuse through under concurrency. Replacing it with an atomic D1 + Durable Objects design — keeping KV only for graceful degradation — fixed the race without leaving the Cloudflare free plan.
Binding qBittorrent to a VPN interface is a kill-switch that usually works. Putting it inside the VPN container's network namespace is a kill-switch that always works. The one-line docker-compose pattern, and why it beats the Reddit-standard approach.
Twenty self-hosted services reachable from anywhere, zero inbound ports open. One cloudflared container, one outbound tunnel, one shared Docker bridge any stack can join in a line. The architectural choice that turns 'Cloudflare Tunnel works' into 'Cloudflare Tunnel scales.'
In early 2026, every TrackerSync request was spending 6–9 KV operations before doing real work — enough to burn through the Cloudflare free tier in hours. Moving rate-limiting to a Durable Object and metadata to D1 cut KV out of the happy path entirely.
Qwen3 emits <think> blocks before every answer. Useful for researchers, expensive for tools. Three places you can install an off-switch — prompt, parameters, Modelfile — and only one survives every transport. Which one, why, and how the other two will mislead you.
Most coding work doesn't need the most expensive model in the room. Planning needs reasoning. Implementation needs adherence. Debugging needs reasoning again. The tiered setup I run, the reasoning tax I refuse to pay, and the per-session math that made me stop apologising for cheap models.
A 50-line FastAPI shim that makes a local whisper-large-v3 server speak fluent OpenAI on /v1/audio/transcriptions. Enough that Open WebUI, LM Studio, and a pile of scripts written for the real API never notice the difference — plus the compatibility traps that took two evenings to find.