Service Bulletin · Vol. 01 Rate-limit edition · Apr 2026

Live service · v0.1 · 90% coverage

Tenant-aware rate limits, forged in the atomic path.

A multi-tenant rate-limit service that answers in a single Redis round-trip: four algorithms — token bucket, fixed window, sliding window, concurrency — each compiled into a Lua script so the decision, the header set, and the ledger update are indivisible. Plans and keys live in Postgres; decisions burn through Redis.

Run a live burst OpenAPI docs ↗ Source ↗

Algorithms

Decision path

1 RTT

Coverage

90%

Backends

Redis · Postgres

Section 01 — Playground

Fire a burst at the live service and watch it throttle.

Every button press below goes to the real POST /v1/check against a seeded demo tenant on the VPS. Switch the algorithm tab to shunt the burst through a different Lua path. The green bars are 200 allowed; the red bars are 429 throttled.

Endpoint · POST /limitforge-rls/v1/check

Token bucket

capacity 10 · refill 2 tok/s · cost 1/call

A bucket of tokens refills at a fixed rate. Each call consumes tokens equal to its cost; if the bucket is empty the call gets a 429 with a Retry-After header derived from the refill rate. Classic Stripe / AWS choice — tolerates bursts up to the cap but enforces the mean.

TOKENS

← refill
2 tok/s

cap
10 →

Subject (the caller identity)

Burst size — 15 calls

11325

Tenant plan

demo-bucket · bucket cap=10, refill=2/s

Each run sends all calls in parallel. Subject is randomised on load so bursts are isolated.

Response timeline

● 0 allowed ● 0 blocked — p50 latency

Last status

waiting…

Remaining

—

Retry-After

—

Section 02 — Architecture

One round-trip from request to decision, plus a journal.

Plans and API keys live in Postgres and are warm-cached in Redis with TTLs. A decision never touches Postgres on the hot path — it only consults the cached plan and executes a single atomic Lua script against the Redis state for the (tenant, subject, resource) triple.

  client  ─── POST /v1/check ──▶  nginx  ──▶  FastAPI (app.api.v1)
                                                        │
                                                        ▼
                    x-api-key verify (Redis first, Postgres fallback)
                                                        │
                                                        ▼
                    resolve Plan (cache hit → Redis, miss → Postgres + fill)
                                                        │
                                                        ▼
                                           DecisionEngine.check(plan)
                                                        │
                                 ┌────────────┬─────────┴──────────┬───────────────┐
                                 ▼            ▼                    ▼               ▼
                          token_bucket  fixed_window      sliding_window   concurrency
                              Lua          INCR + TTL             ZSET + ZREM      ZSET + TTL
                                 │            │                    │               │
                                 └────────────┴─────────┬──────────┴───────────────┘
                                                        ▼
                                 decision = { allowed, remaining, reset_at, retry_after_ms }
                                                        │
                          ┌─────────────────────────────┼─────────────────────────────┐
                          ▼                             ▼                             ▼
                   response headers           Prometheus counters             structlog JSON line
                   X-RateLimit-Limit             rl_allowed_total              event=api.v1.check
                   X-RateLimit-Remaining         rl_blocked_total              algorithm=...
                   X-RateLimit-Reset                                           outcome=...
                   Retry-After

Atomic path (Lua)

Read counter, compute new state, write, return — all inside a single EVAL. Redis's single-threaded execution guarantees no two decisions for the same key interleave, which is the only correctness property the limiter needs.

Plans warm the cache

Plan rows are hashed into Redis on first resolve and refreshed on every write via admin APIs. The hot POST /v1/check hits Postgres only on a cold tenant/plan pair.

Subject granularity

The subject string is the rate-limit identity — it can be an end-user id, an API key hash, or a request IP. Concurrency plans also match on subject for tracking in-flight requests.

Header discipline

Every 200 and every 429 carry X-RateLimit-Limit / Remaining / Reset and Retry-After. Consumers can back off without re-requesting the plan — the state is on the wire.

Section 03 — Algorithms

Four limiter families. One decision contract.

Each algorithm is implemented as a Lua script with a stable return contract — { allowed, remaining, reset_at, retry_after_ms } — so the surrounding Python code is identical regardless of backend. Plans pick the one that matches the shape of traffic they're throttling.

A · Smoothed

Token bucket

A bucket of tokens refills continuously at r tok/s up to capacity C. Each call withdraws cost tokens; empty bucket ⇒ 429. Excellent when you want to tolerate short bursts but enforce the mean. Default choice.

tokens  ← min(C, tokens + (now − ts) · r)
ts      ← now
if tokens < cost:   return (429, retry = (cost − tokens) / r)
tokens  ← tokens − cost
return (200, remaining = floor(tokens))

B · Bounded

Fixed window

A counter per calendar window of W seconds; +1 per call, 429 when the counter exceeds L. Cheapest to compute (single INCR + EXPIRE) and gives strong upper-bounds. Prone to edge-bursts at window boundaries.

key     = tenant:subject:resource:floor(now/W)
n       = INCR(key); if new: EXPIRE(key, W)
if n > L: return (429, retry = W − (now mod W))
return (200, remaining = L − n)

C · Rolling

Sliding window (log)

Timestamps live in a sorted set; on each call trim entries older than now − W, then accept only if |set| < L. Removes fixed-window's edge effect at the cost of O(K) memory per active subject for the keep-alive window.

ZREMRANGEBYSCORE  key  −∞  now − W
k   = ZCARD key
if k ≥ L: return (429, retry derived from oldest)
ZADD  key  now  request_id
EXPIRE key  W + 1

D · In-flight

Concurrency

Tracks open calls per subject, not rate. A request ticket is added to the subject's sorted set on entry; the call's id is evicted on completion (or by stale-TTL if the caller never returns). 429 when the open count hits C.

now   = timestamp_ms()
ZREMRANGEBYSCORE key  −∞  now − STALE_TTL
n     = ZCARD key
if n ≥ C: return (429, retry = STALE_TTL)
ZADD  key  now  request_uuid
return (200, remaining = C − n − 1)

Section 04 — API surface

A narrow public surface, administered out-of-band.

Callers only ever hit /v1/check — the entire decision contract is expressed there. /v1/admin/* is behind a bearer token and is where tenants, plans, keys, and resource policies are provisioned. Everything is JSON; OpenAPI spec is live.

POST /v1/check

The hot path. Authenticates with x-api-key, resolves (or overrides) the plan, returns a decision in ~1ms.

curl -X POST /limitforge-rls/v1/check \
  -H "x-api-key: $KEY" \
  -H "content-type: application/json" \
  -d '{"resource":"GET:/orders","subject":"user:42",
       "cost":1,"plan_id":"…"}'

GET /v1/health

Cheap liveness. Returns {"status":"ok","version":"0.1.0"}. Also exported to Prometheus.

curl /limitforge-rls/v1/health
{"status":"ok","version":"0.1.0"}

POST /v1/admin/plans

Create a plan. Algorithm-specific fields are validated — e.g. token_bucket requires bucket_capacity + refill_rate_per_sec.

curl -X POST /v1/admin/plans \
  -H "Authorization: Bearer $ADMIN" \
  -d '{"tenant_id":"…","name":"pro",
       "algorithm":"token_bucket",
       "bucket_capacity":100,
       "refill_rate_per_sec":20}'

POST /v1/admin/keys

Mint an API key for a tenant. Secret is returned exactly once; salted-SHA256 hash is stored and cached.

curl -X POST /v1/admin/keys \
  -H "Authorization: Bearer $ADMIN" \
  -d '{"tenant_id":"…","name":"prod-app"}'
→ {"key":"…","key_hash":"…"}

Section 05 — Observability

Prometheus, structlog, OpenTelemetry — all first-class.

You can't fix a limiter you can't see. Every decision emits a Prometheus counter, a structured log line, and an OTEL span; no print()s, no ad-hoc metrics.

Metrics

rl_allowed_total and rl_blocked_total counters labelled by route outcome. Plus requests_total{route,outcome} for every FastAPI route.

Scrape live metrics ↗

Logs

JSON lines via structlog. Every check logs algorithm, tenant, subject_hash, and outcome so ad-hoc analysis is one jq away.

Traces

OTEL instrumentation on FastAPI, SQLAlchemy and Redis. Each /v1/check span carries the plan id and decision. Export to any OTLP collector.

“A rate limiter is a contract, not a barrier —
it tells the client exactly how to behave.”

— Design note · LimitForge RLS, 2026