TrackerSync is a free, serverless Fitbit → Garmin migration tool I run on Cloudflare’s free tier. In February 2026, every normal API request was spending 6 to 9 KV operations before doing any actual work. At even modest traffic that meant burning through the free-tier KV ceiling in hours. This is the story of moving rate-limiting to a Durable Object, metadata to D1, and shrinking KV’s role to “graceful degradation only.” Result: zero KV ops on the happy path.
How I noticed
The first warning was a 429 from my own site, on my own laptop, after I’d run two test conversions in a row. That shouldn’t be possible — the rate limits are per client, and I’m exactly one client. I opened the Cloudflare dashboard expecting a misconfigured threshold. The KV ops counter is what made me close my laptop and go for a walk.
At the traffic TrackerSync was getting — a few hundred conversions a day — the KV daily-operations chart was already at 80% of free-tier ceiling and trending up. I’d been telling myself the chart looked like that because I was testing. The chart did not care what I told myself.
The shape of the problem
TrackerSync’s hot path looks deceptively simple:
client → /api/upload → /api/validate → /api/convert → /api/download
Underneath, each of those endpoints was hitting Cloudflare KV multiple times before it even touched the user’s file:
| Endpoint | KV ops/request (before) | KV ops/request (after) |
|---|---|---|
/api/upload | 6 – 9 | 0 |
/api/validate | 6 – 9 | 0 |
/api/convert | 7 – 10 | 0 |
/api/download/* | 6 – 9 | 0 |
/api/usage/* | 3 | 0 |
The 6-9 wasn’t one careless KV call repeated; it was four legitimate concerns layered on top of each other:
- Suspicious-client check in global middleware. KV read.
- Blocked-client check. KV read.
- Multi-tier rate-limiter. Up to 3 KV reads (per-IP, per-fingerprint, per-cookie).
- Health probe churn. A background “is KV alive?” probe that wrote, read, and deleted a sentinel key — running often enough to dominate the budget at low traffic.
Each one made sense on its own. Stacked, they were a free-tier killer.
Why KV was the wrong tool
KV is wonderful for what it’s designed for: globally-replicated, eventually-consistent reads. The cache-style reads are practically free in latency terms. But “rate limit” and “did this IP misbehave?” have requirements KV doesn’t meet well:
- Atomicity. Two concurrent requests need to agree on whether a counter has been incremented. KV is last-write-wins.
- Read-your-writes consistency. A blocked client should be blocked immediately, not after KV’s propagation window.
- Pricing model. KV charges per operation; D1 charges per row-scan. Counters that get hit on every request are cheaper as rows than as KV keys.
Once you say it out loud — “I need atomic counters with read-your-writes consistency” — the answer is obvious: Durable Objects, with D1 as the durable record. The reason I hadn’t said it out loud six months earlier is the reason I hadn’t said it out loud six months earlier. KV had been the thing I reached for when I was building fast, and it kept working in the way KV always keeps working: just well enough that the bill comes later.
The new architecture
Three swaps did the work:
1. Rate-limiting moves into a Durable Object
AtomicRateLimiter is a Durable Object keyed by client signal (fingerprint + IP-hash + cookie blend). Inside the DO, increments are serialized — no race conditions, no double-spends. The DO’s storage API is read-after-write consistent by construction, so a newly-blocked client stays blocked on the next request even if it lands in a different isolate.
The middleware became three lines:
const limiter = env.RATE_LIMIT_DO.idForName(signal);
const stub = env.RATE_LIMIT_DO.get(limiter);
const decision = await stub.fetch(request);
2. Transient metadata moves from KV to D1
Uploads and conversions both leave a metadata breadcrumb that downstream endpoints need to read. These had been KV entries with a TTL. They’re now D1 rows:
CREATE TABLE uploads (
upload_id TEXT PRIMARY KEY,
metadata TEXT,
created_at INTEGER,
expires_at INTEGER
);
CREATE INDEX idx_uploads_expires_at ON uploads(expires_at);
Expired rows are treated as missing on read and lazily deleted during maintenance. The TTLs (3600s for uploads, 7200s for conversions) didn’t change; only the backing store did.
3. Health probes leave the hot path
The “is KV alive?” check used to run on every request as a write-read-delete cycle on a sentinel key. That was three KV ops per request just to know whether KV was sad. The new arrangement:
- Hot path: assume KV is fine; rely on D1+DO for correctness.
- Cold path: a single read-only
get()of a known sentinel, run only when status endpoints or maintenance jobs ask. - Degraded mode: if D1 fails, fall back to KV-based limiting. KV is the lifeboat, not the engine.
The shape of the result
The first 24 hours after deployment told a clear story:
KV ops/day
┌────────────────────────────────────────────────────┐
│ before ████████████████████████████████ ~310,000 │
│ after ▏ <500 (fallback + status pings) │
└────────────────────────────────────────────────────┘
(~99.8% reduction)
D1 query volume went up, but D1 row reads on the free tier have a much larger budget than KV ops, and most reads hit the uploads / conversions indexes. The new bottleneck — if there is one — is D1 row-reads, which I now have a clean signal on.
The DO didn’t blow up. That was the part I was most nervous about: Durable Objects serialize requests by key, so a misconfigured signal blend could turn a single noisy client into a global queue. The blend (fingerprint_hash ⊕ ip_subnet) keeps cardinality high enough that no single DO instance carries a meaningful slice of traffic. I had a half-written rollback PR open in another tab for the first 48 hours. I closed it without reading it again.
What this cost me
Three things to be honest about.
Migration risk. Moving the rate-limit decision to a new storage backend is the kind of change that, done badly, lets every request through for fifteen minutes. I shipped it behind a flag with KV as a fallback for the first week, watched the metrics, and only removed the flag once D1 + DO had carried a full traffic cycle without complaint.
A new failure mode. The DO is now on the critical path. If a Durable Object namespace has trouble, every request slows down or fails. The intelligent-fallback path covers it, but the latency of falling back is a real thing I now have to alert on.
Per-tenant noise. D1 is single-region (closest replica reads, single writer). At low scale this is fine; at higher scale I’ll need to think about read replicas or sharding. That’s a future problem, not today’s.
The takeaway
KV is a great primitive for the workloads it was designed for: globally-cached configuration, session-shaped reads, read-heavy lookups where eventual consistency is fine. The mistake I made — and I think it’s a common one — was reaching for KV because it was the first key-value store at hand, not because it fit.
Rate limiting wants atomicity. Metadata wants transactional reads with a TTL. Both have better homes on Cloudflare’s stack than KV. The hot path is cleaner, the free-tier headroom is back, and — quieter benefit — the worker is now easier to reason about, because each storage layer has one job.
Project: TrackerSync — free Fitbit → Garmin migration on Cloudflare’s free tier.