Atomic rate-limiting on the Cloudflare free tier

Why my old rate-limiter wasn't actually limiting anything, and the three-store architecture that fixed it without leaving the free plan.


The original TrackerSync rate-limiter was a KV-backed read-modify-write loop. It looked fine on paper and let abuse through in practice — two concurrent requests could both see “0 of 3” and both increment, and now you’ve burned 2 of 3 daily conversions in one second. This is how I replaced it with an atomic system on D1 + Durable Objects, kept KV only for graceful degradation, and stayed on the free plan.

The bug I didn’t believe I had

I noticed the bug the same way you usually notice this bug: someone with a script discovered the limit wasn’t a limit. The script wasn’t sophisticated. It just sent five requests at once. Three were “supposed to” succeed, two were “supposed to” be blocked. All five succeeded.

The race was simple, embarrassing, and exactly what every textbook tells you not to do with KV:

t=0ms  request A reads count=0   ──┐
t=0ms  request B reads count=0   ──┤    KV: count=0
t=1ms  request A writes count=1  ──┤
t=1ms  request B writes count=1  ──┘    KV: count=1 (should be 2)
t=2ms  request A is allowed
t=2ms  request B is allowed

KV is eventually consistent and last-write-wins. Read-modify-write on top of KV is not atomic; it’s a polite suggestion that two requests will please not arrive at the same time. They arrived at the same time.

I’d known this in the abstract for as long as I’d used KV. What I hadn’t done was sit down and ask whether anything in my hot path actually needed atomicity. Rate limiting needed atomicity. I had built rate limiting on KV. That was the bug.

What “rate limit” actually demands

I wrote it down to stop kidding myself:

  1. Atomic increment. Two concurrent requests must see different post-states. Not “usually.” Always.
  2. Read-your-writes. If the previous request just got blocked, this request must also see “blocked” — no propagation delay.
  3. Bounded cost. Counters get hit on every request. Per-operation pricing (KV) gets expensive fast; per-row-scan pricing (D1) doesn’t.
  4. Survives a partial outage. If the authoritative store is down for a minute, requests shouldn’t all 500. Degrade, don’t fall over.

KV gives you (3) and (4) and fails (1) and (2). D1 gives you (1), (2), and (3) and is more brittle on (4). Durable Objects give you (1) and (2) cleanly and are not great for (3) at high cardinality. The architecture writes itself once you stop trying to make any single store do all four.

The three-store layout

flowchart LR
    C[Client] --> W[Worker]
    W -->|hot path| D1[(D1 — authoritative<br/>atomic counters,<br/>reputation, config)]
    W -->|hot path| DO[(Durable Object<br/>suspicious / blocked state)]
    W -.degraded only.-> KV[(KV — fallback<br/>+ status cache)]
    D1 -.async.-> R2[(R2 — analytics<br/>+ audit trail)]
    DO -.async.-> R2
    style D1 fill:#e6f4ea,stroke:#2c7a3f
    style DO fill:#e3edff,stroke:#1f4eb0
    style KV fill:#f3f3f3,stroke:#aaa
    style R2 fill:#fff3cd,stroke:#b58900

The shape that makes this work is the UNIQUE index. D1’s INSERT ... ON CONFLICT(...) DO UPDATE SET request_count = request_count + 1 is genuinely atomic — two concurrent requests serialize on the index and one of them sees the other’s increment. The race goes away because the database refuses to let it happen.

The schema, in one screen

CREATE TABLE rate_limits (
  id              INTEGER PRIMARY KEY AUTOINCREMENT,
  client_id       TEXT NOT NULL,    -- signal blend, not raw IP
  endpoint        TEXT NOT NULL,    -- 'conversions', 'uploads', ...
  window_start    INTEGER NOT NULL, -- bucketed unix time
  request_count   INTEGER NOT NULL DEFAULT 1,
  timestamp       INTEGER NOT NULL,
  metadata        TEXT
);

-- The line that does the work:
CREATE UNIQUE INDEX idx_rate_limits_lookup
  ON rate_limits(client_id, endpoint, window_start);

CREATE INDEX idx_rate_limits_timestamp ON rate_limits(timestamp);

The atomic check, in pseudocode the worker actually runs:

const row = await env.DB.prepare(`
  INSERT INTO rate_limits (client_id, endpoint, window_start, timestamp)
  VALUES (?1, ?2, ?3, ?4)
  ON CONFLICT(client_id, endpoint, window_start)
  DO UPDATE SET request_count = request_count + 1,
                timestamp     = excluded.timestamp
  RETURNING request_count
`).bind(clientId, endpoint, windowStart, now).first();

if (row.request_count > limit) {
  await markBlocked(clientId, endpoint);
  return new Response('Rate limited', { status: 429 });
}

The block notice goes to the Durable Object so the next request from this clientId short-circuits before it ever hits D1 again. That keeps D1’s per-day row-scan budget sane on the free plan.

The client ID problem

A counter is only as good as what it counts. The original system keyed on raw IP. Anyone behind a CGNAT or a VPN was either over-limited (sharing with neighbours) or under-limited (rotating exit nodes). I needed something stronger than IP and lighter than a full browser fingerprint.

The blend I settled on:

LayerWhat it isWhen it’s used
Signed session cookiets_session, set after a Turnstile passPrimary — most users have one
Lightweight fingerprintA small handful of stable browser signals, hashedFallback when no cookie
Hashed IPsha256(ip + salt), truncatedPressure signal — too many fingerprints from one IP-hash → squeeze limits

The cookie is the cheap, accurate path. The fingerprint catches incognito and fresh-browser cases. The IP-hash isn’t a rate-limit key itself; it’s a pressure signal — if I see twenty distinct fingerprints behind one IP-hash in a short window, I dial that bucket’s effective limit down for an hour. It’s a soft signal, designed to slow a script down rather than block a household.

D1 stores hashes only. There’s no raw IP at rest. Privacy and the free-tier budget happened to point the same direction here.

Reputation, in four buckets

The system also tracks a per-client reputation_score. It starts at 100 and decays on violations. Effective limits scale with reputation:

score  bucket     effective limit
─────  ────────   ──────────────
100-75 LOW        full
74-50  MEDIUM     60% of normal
49-25  HIGH       30% of normal
24-0   CRITICAL   10% of normal

A first-time user gets full caps. A user who’s just hammered the API gets less. The decay is generous enough that a single mistake doesn’t lock you out, and the recovery is slow enough that a determined abuser can’t just wait out a five-minute window.

This isn’t fancy. It’s a single column in client_reputation, updated on the same transaction that records a violation. It also doubles as the only metric I actually care about for “is something weird happening” — a sudden jump in CRITICAL-bucket clients means I have a real problem, not a chart problem.

What broke during the rollout

Three things, in order of how much I deserved them.

The migration order. I deployed the new D1 schema before the worker code that used it, intending to roll out the worker change behind a flag. The order of operations was fine; what wasn’t fine was that I’d written the migration against a stale local D1 snapshot, so the production migration introduced an index name collision. The deploy failed cleanly — D1 refused — and I learned to migrate against a fresh prod snapshot every time.

The DO key cardinality. First version of the signal-blend hashed (cookie || ip) into the DO key. That gave logged-in users a stable DO per cookie, which is fine, but the IP-only fallback users all collapsed into a small number of subnet-shaped DOs, and a single noisy subnet briefly became a global queue. The fix was a wider blend (cookie ⊕ fingerprint ⊕ ip_subnet), which keeps cardinality high enough that no single DO carries a meaningful slice of traffic. I noticed because P99 latency on /api/convert doubled. I am grateful for P99 charts.

The fallback path being too generous. When D1 was unavailable for a minute (planned maintenance, my fault), KV-backed fallback kicked in with the same caps as the authoritative path. Two requests slipped through that shouldn’t have. The fallback now uses caps that are ~40% of normal, on the theory that a brief over-block is much less bad than a brief under-block.

What this costs to run

Free plan, six months in:

                  daily ceiling   typical day   peak day
KV ops/day              100,000   <500          ~1,500
D1 reads/day          5,000,000   ~80,000       ~250,000
D1 writes/day           100,000   ~3,500        ~12,000
DO requests/day       1,000,000   ~15,000       ~45,000
R2 class A ops/day      1,000     ~120          ~300

All well inside the free ceilings. The architecture that fixed the race condition also fits the budget. That part I will admit was a pleasant surprise.

Three things I’d tell past-me

KV is fine, just not for counters. It’s a great config store. It’s a great cache. It is a terrible primitive for anything that needs concurrent atomic updates, and the failure mode of pretending otherwise is silent — the limit just doesn’t limit, and you find out via a script and a chart.

Pick the storage by the property, not by the API surface. D1, DO, KV, and R2 each have a property the others lack. Build a small chart of which property each request needs and the storage choices fall out for you.

Ship the fallback path first, not last. I built the happy path, deployed it, and then noticed I’d left the fallback path with the old behaviour. Build the degraded mode at the same time as the primary mode, or you’ll discover the degraded mode the hard way during the first outage.


Project: TrackerSync — free Fitbit → Garmin migration on Cloudflare’s free tier. Companion piece: Killing KV in the hot path.