Tenant-aware rate limits, forged in the atomic path.
A multi-tenant rate-limit service that answers in a single Redis round-trip: four algorithms — token bucket, fixed window, sliding window, concurrency — each compiled into a Lua script so the decision, the header set, and the ledger update are indivisible. Plans and keys live in Postgres; decisions burn through Redis.
Fire a burst at the live service and watch it throttle.
Every button press below goes to the real POST /v1/check against a seeded demo tenant on the VPS. Switch the algorithm tab to shunt the burst through a different Lua path. The green bars are 200 allowed; the red bars are 429 throttled.
A bucket of tokens refills at a fixed rate. Each call consumes tokens equal to its cost; if the bucket is empty the call gets a 429 with a Retry-After header derived from the refill rate. Classic Stripe / AWS choice — tolerates bursts up to the cap but enforces the mean.
2 tok/s
10 →
One round-trip from request to decision, plus a journal.
Plans and API keys live in Postgres and are warm-cached in Redis with TTLs. A decision never touches Postgres on the hot path — it only consults the cached plan and executes a single atomic Lua script against the Redis state for the (tenant, subject, resource) triple.
client ─── POST /v1/check ──▶ nginx ──▶ FastAPI (app.api.v1) │ ▼ x-api-key verify (Redis first, Postgres fallback) │ ▼ resolve Plan (cache hit → Redis, miss → Postgres + fill) │ ▼ DecisionEngine.check(plan) │ ┌────────────┬─────────┴──────────┬───────────────┐ ▼ ▼ ▼ ▼ token_bucket fixed_window sliding_window concurrency Lua INCR + TTL ZSET + ZREM ZSET + TTL │ │ │ │ └────────────┴─────────┬──────────┴───────────────┘ ▼ decision = { allowed, remaining, reset_at, retry_after_ms } │ ┌─────────────────────────────┼─────────────────────────────┐ ▼ ▼ ▼ response headers Prometheus counters structlog JSON line X-RateLimit-Limit rl_allowed_total event=api.v1.check X-RateLimit-Remaining rl_blocked_total algorithm=... X-RateLimit-Reset outcome=... Retry-After
Atomic path (Lua)
Read counter, compute new state, write, return — all inside a single EVAL. Redis's single-threaded execution guarantees no two decisions for the same key interleave, which is the only correctness property the limiter needs.
Plans warm the cache
Plan rows are hashed into Redis on first resolve and refreshed on every write via admin APIs. The hot POST /v1/check hits Postgres only on a cold tenant/plan pair.
Subject granularity
The subject string is the rate-limit identity — it can be an end-user id, an API key hash, or a request IP. Concurrency plans also match on subject for tracking in-flight requests.
Header discipline
Every 200 and every 429 carry X-RateLimit-Limit / Remaining / Reset and Retry-After. Consumers can back off without re-requesting the plan — the state is on the wire.
Four limiter families. One decision contract.
Each algorithm is implemented as a Lua script with a stable return contract — { allowed, remaining, reset_at, retry_after_ms } — so the surrounding Python code is identical regardless of backend. Plans pick the one that matches the shape of traffic they're throttling.
Token bucket
A bucket of tokens refills continuously at r tok/s up to capacity C. Each call withdraws cost tokens; empty bucket ⇒ 429. Excellent when you want to tolerate short bursts but enforce the mean. Default choice.
tokens ← min(C, tokens + (now − ts) · r) ts ← now if tokens < cost: return (429, retry = (cost − tokens) / r) tokens ← tokens − cost return (200, remaining = floor(tokens))
Fixed window
A counter per calendar window of W seconds; +1 per call, 429 when the counter exceeds L. Cheapest to compute (single INCR + EXPIRE) and gives strong upper-bounds. Prone to edge-bursts at window boundaries.
key = tenant:subject:resource:floor(now/W) n = INCR(key); if new: EXPIRE(key, W) if n > L: return (429, retry = W − (now mod W)) return (200, remaining = L − n)
Sliding window (log)
Timestamps live in a sorted set; on each call trim entries older than now − W, then accept only if |set| < L. Removes fixed-window's edge effect at the cost of O(K) memory per active subject for the keep-alive window.
ZREMRANGEBYSCORE key −∞ now − W k = ZCARD key if k ≥ L: return (429, retry derived from oldest) ZADD key now request_id EXPIRE key W + 1
Concurrency
Tracks open calls per subject, not rate. A request ticket is added to the subject's sorted set on entry; the call's id is evicted on completion (or by stale-TTL if the caller never returns). 429 when the open count hits C.
now = timestamp_ms() ZREMRANGEBYSCORE key −∞ now − STALE_TTL n = ZCARD key if n ≥ C: return (429, retry = STALE_TTL) ZADD key now request_uuid return (200, remaining = C − n − 1)
A narrow public surface, administered out-of-band.
Callers only ever hit /v1/check — the entire decision contract is expressed there. /v1/admin/* is behind a bearer token and is where tenants, plans, keys, and resource policies are provisioned. Everything is JSON; OpenAPI spec is live.
The hot path. Authenticates with x-api-key, resolves (or overrides) the plan, returns a decision in ~1ms.
curl -X POST /limitforge-rls/v1/check \ -H "x-api-key: $KEY" \ -H "content-type: application/json" \ -d '{"resource":"GET:/orders","subject":"user:42", "cost":1,"plan_id":"…"}'
Cheap liveness. Returns {"status":"ok","version":"0.1.0"}. Also exported to Prometheus.
curl /limitforge-rls/v1/health {"status":"ok","version":"0.1.0"}
Create a plan. Algorithm-specific fields are validated — e.g. token_bucket requires bucket_capacity + refill_rate_per_sec.
curl -X POST /v1/admin/plans \ -H "Authorization: Bearer $ADMIN" \ -d '{"tenant_id":"…","name":"pro", "algorithm":"token_bucket", "bucket_capacity":100, "refill_rate_per_sec":20}'
Mint an API key for a tenant. Secret is returned exactly once; salted-SHA256 hash is stored and cached.
curl -X POST /v1/admin/keys \ -H "Authorization: Bearer $ADMIN" \ -d '{"tenant_id":"…","name":"prod-app"}' → {"key":"…","key_hash":"…"}
Prometheus, structlog, OpenTelemetry — all first-class.
You can't fix a limiter you can't see. Every decision emits a Prometheus counter, a structured log line, and an OTEL span; no print()s, no ad-hoc metrics.
Metrics
rl_allowed_total and rl_blocked_total counters labelled by route outcome. Plus requests_total{route,outcome} for every FastAPI route.
Logs
JSON lines via structlog. Every check logs algorithm, tenant, subject_hash, and outcome so ad-hoc analysis is one jq away.
Traces
OTEL instrumentation on FastAPI, SQLAlchemy and Redis. Each /v1/check span carries the plan id and decision. Export to any OTLP collector.
“A rate limiter is a contract, not a barrier —— Design note · LimitForge RLS, 2026
it tells the client exactly how to behave.”