ScreenSearch sat at 20%+ CPU at idle for an embarrassing number of weeks. I told myself this was the price of running a continuous capture loop. It wasn’t. Two specific things were doing all the work: synchronous FTS5 triggers inside the capture transaction, and OCR running on every frame regardless of whether the screen had changed. Pulling each one out cut CPU in half. Together they got me to ~3% idle, on the same hardware, with no other changes.
The chart that wouldn’t go down
I keep a tiny dashboard while ScreenSearch is running. It plots CPU, GPU, memory, and capture rate. For the first month, “CPU at idle” was a thick green band hovering between 18% and 24%, with spikes up to 35% when I had a busy screen. I had explanations for this — OCR is expensive, screens have a lot of pixels, my laptop is three years old.
The explanations were wrong. The CPU was high because of two things I had built into the system on purpose, and could remove without losing anything.
Bottleneck 1: every write blocked on FTS
The first version had a single SQLite schema with an FTS5 virtual table mirroring the OCR text, kept in sync by triggers:
CREATE TABLE ocr_text (
id INTEGER PRIMARY KEY,
capture_id INTEGER NOT NULL,
text TEXT NOT NULL,
created_at INTEGER NOT NULL
);
CREATE VIRTUAL TABLE ocr_text_fts USING fts5(
text,
content='ocr_text',
content_rowid='id'
);
-- the lines that did the damage:
CREATE TRIGGER ocr_text_ai AFTER INSERT ON ocr_text BEGIN
INSERT INTO ocr_text_fts(rowid, text) VALUES (new.id, new.text);
END;
It’s textbook FTS5. The official SQLite docs recommend exactly this pattern. The pattern is also a small disaster for a high-throughput writer.
Every INSERT INTO ocr_text was synchronously running an INSERT INTO ocr_text_fts inside the same transaction. FTS5 rebalances its inverted index on every insert. The capture loop was waiting on this rebalance, on the main capture thread, while the next frame’s deadline approached.
The signature of the problem in tracing logs:
2025-12-04T14:32:11Z DEBUG captured frame in 41ms
2025-12-04T14:32:11Z DEBUG ocr completed in 89ms
2025-12-04T14:32:14Z DEBUG db insert completed in 2810ms ← here
2025-12-04T14:32:14Z WARN capture deadline missed: 3s budget, 2.94s spent
2.8 seconds of an alleged 3-second budget, all on a db insert that was advertised as cheap. The trace was right; the insert wasn’t cheap because of what the trigger was doing under it.
The fix: separate the writer from the indexer
The architecture moved from synchronous-trigger to background-worker:
BEFORE
─────────────────────────────────────────────────────
Capture → OCR → INSERT (trigger updates FTS) → BLOCKS
▲
│ capture thread is here
AFTER
─────────────────────────────────────────────────────
Capture → OCR → INSERT into ocr_text → returns
┌──────────────────────────────────┐
│ Background indexer (own thread) │
│ - reads ocr_text where id > X │
│ - batches 100 rows per transaction│
│ - inserts into ocr_text_fts │
└──────────────────────────────────┘
The trigger is gone. A separate Rust thread polls ocr_text for rows beyond the last-indexed id, batches a hundred at a time into a single FTS transaction, and goes back to sleep. If the indexer crashes, capture is unaffected — when it restarts, it picks up where it left off by reading MAX(rowid) from ocr_text_fts.
The relevant 30 lines of the indexer:
fn process_batch(&mut self, last_id: i64) -> Result<usize> {
let mut conn = self.db.connection_mut();
let tx = conn.transaction()?;
let mut stmt = tx.prepare(
"SELECT id, text FROM ocr_text
WHERE id > ?1 ORDER BY id ASC LIMIT ?2"
)?;
let rows = stmt.query_map(
params![last_id, self.batch_size],
|row| Ok((row.get::<_, i64>(0)?, row.get::<_, String>(1)?))
)?;
let mut count = 0;
{
let mut insert_stmt = tx.prepare(
"INSERT INTO ocr_text_fts (rowid, text) VALUES (?1, ?2)"
)?;
for row in rows {
let (id, text) = row?;
insert_stmt.execute(params![id, text])?;
count += 1;
}
}
if count > 0 { tx.commit()?; }
Ok(count)
}
Capture-side latency dropped from 2.8s-per-insert to ~40ms. The indexer keeps up — there’s a steady-state lag of a few seconds between “frame captured” and “frame searchable,” which is fine because nobody is searching for a frame that was captured five seconds ago.
Bottleneck 2: OCR on every frame, including frames that hadn’t changed
The second bottleneck was philosophically embarrassing. The capture loop ran every 3 seconds; every captured frame went to OCR; OCR is expensive; therefore CPU was high.
But most frames are the same as the frame before them. I was watching a static editor for ten minutes; my screen didn’t change for ten minutes; OCR ran 200 times on the same image. Each run was identical and each result was identical and each result was discarded because the dedup check ran after the OCR.
The fix is to dedup before OCR:
// before OCR: compute a fast perceptual hash
let hash = phash(&frame);
if let Some(prev_hash) = self.last_hash {
if hamming_distance(hash, prev_hash) < THRESHOLD {
// screen unchanged, skip OCR
return Ok(SkipReason::Unchanged);
}
}
self.last_hash = Some(hash);
// only now do the expensive thing
let text = self.ocr.process(&frame)?;
phash (perceptual hash) is cheap — about 0.4ms on a 1080p frame. The Hamming distance comparison is cheaper. Together they let me skip OCR for any frame whose perceptual hash is close enough to the previous frame’s hash.
The threshold matters. Too tight and a blinking cursor counts as a change. Too loose and small but real changes (a notification appearing in the corner) get missed. I landed on 8 (out of 64), which is the loose end of “obvious to a human” and the tight end of “obvious to a hash.” Numbers between 6 and 10 all behaved similarly; the threshold is forgiving.
What that did to the chart
On a typical workday, with periods of activity and longer periods of reading or thinking, the dedup check eliminates 65-80% of frames before they reach OCR. CPU at idle dropped accordingly:
idle CPU busy-screen CPU
before fixes ~22% ~35%
after fix 1 ~11% ~24% (FTS no longer blocks capture)
after fix 2 ~3% ~18% (OCR only on changed frames)
The “busy” number is still meaningful — when there’s real change on screen, OCR runs, and OCR is genuinely not free. But the idle number is the one I actually care about, because most of the time I’m not actively typing. The daemon was supposed to be invisible. It is now invisible.
What I’d put on a sticky note for next time
A trigger inside a high-frequency write path is a footgun. Triggers are convenient and they’re correct, and they will silently take the latency budget you didn’t know you were spending. If your INSERT is on a hot path, write the secondary index from a background worker. The official-docs pattern is the right pattern for low-write workloads. It is not the right pattern when you’re writing thousands of rows a day.
Dedupe before you spend. OCR was the obvious villain because it was the obvious expense. The actual problem was running OCR on data that didn’t need OCR. The same logic generalises: any expensive step deserves a cheap pre-check, and a perceptual hash is the cheapest pre-check there is.
Look at the histogram, not the mean. Average CPU “in the 20s” was a story I’d been telling myself. The histogram showed something more useful: lots of frames at 80%+ for short bursts, then long stretches at 5%. The bursts were the thing to fix; the mean was hiding them. Profiling tools that show you the distribution are worth their disk space.
Two fixes, one weekend, ~85% reduction in idle CPU. The improvements are not impressive engineering; they’re the result of looking at what was actually happening on the hot path and refusing to keep believing the comfortable story about why the chart looked the way it did.
Project: ScreenSearch. Related: Building ScreenSearch.