en_core_web_sm NER + prioritized regex rules (PATTERNS ladder). Overlaps resolved by priority / length / start.A FastAPI service that detects 20+ PHI and PII categories — SSN, SIN, NPI, DEA, MRN, HICN, provincial health cards, street addresses, ZIP, postal codes, credit cards, IBAN and routing numbers — and applies per-label mask / hash / redact policies with spaCy NER, regex rules and deterministic span resolution.
| # | Label | Matched Value | Action | Span |
|---|---|---|---|---|
| No entities detected. Paste text and press Redact. | ||||
Every detector is prioritized. Structured identifiers — SSN, NPI, credit card,
MRN — outrank generic NER spans on overlap so a patient ID never gets silently
merged into a person name. All patterns live in app/deid/regex_rules.py.
BWH-47882910) with context anchoring.A1A 1A1 postal code.en_core_web_sm — first/last name detection, redacted by default.
Each category maps to exactly one action in a POLICY_MAP dict.
Mask replaces every character with * to preserve
column width. Hash emits a salted SHA-256 so records can
still be joined or counted. Redact swaps the value for a
[REDACTED:LABEL] tag when the span should vanish entirely.
| Label | Input | Action | Output |
|---|---|---|---|
| SSN | 412-55-7891 | hash | SSN_HASH:7d1c45…a23988e |
| NPI | NPI# 1538296472 | hash | NPI_HASH:2b039f…89c47f |
| MRN | BWH-47882910 | hash | MRN_HASH:4f2a11…7ffc91 |
| CREDIT_CARD | 4532 8827 1104 9951 | mask | ******************* |
| PHONE_US | (617) 555-0134 | mask | ************** |
| DATE | 04/02/2026 | mask | ********** |
| PERSON | Eleanor R. Whitfield | redact | [REDACTED:PERSON] |
| US_STREET | 482 Commonwealth Avenue | redact | [REDACTED:ADDRESS] |
| IP | 72.14.212.85 | redact | [REDACTED:IP] |
FastAPI service with a Pydantic-validated transport, a SlowAPI rate limiter and a 1 MB body cap. Detection runs in-process; long-running evaluation jobs hand off to Celery over Redis. Metrics persist in Postgres for auditability.
# ───────────────────────────────────────────────────────────── # Request path # ───────────────────────────────────────────────────────────── client fastapi deid engine │ │ │ │ POST /api/v1/deid │ │ ├─────────────────────────▶│ require_api_key · rate_limit │ │ │ max_body 1MB · max_text_size │ │ ├────────────────────────────────▶│ │ │ │ spaCy NER │ │ │ en_core_web_sm │ │ │ │ │ │ │ ▼ │ │ │ regex ladder │ │ │ PATTERNS[] sorted │ │ │ by priority │ │ │ │ │ │ │ ▼ │ │ │ POLICY_MAP │ │ │ SSN→hash PHONE→mask │ │ │ NAME→redact DATE→mask │ │ │ │ │ { result_text, │ JSONResponse │ ▼ │◀─ entities, time_ms } ◀─┤◀────────────────────────────────┤ span resolution │ │ │ (priority · length · start) # ───────────────────────────────────────────────────────────── # Async jobs · long evaluations # ───────────────────────────────────────────────────────────── fastapi ──enqueue──▶ celery ──▶ redis broker ──▶ worker ──▶ postgres MetricRun
en_core_web_sm NER + prioritized regex rules (PATTERNS ladder). Overlaps resolved by priority / length / start.mask replaces every char with *; redact inserts [REDACTED:LABEL]; hash emits LABEL_HASH:<SHA-256(salt‖value)>.A1A 1A1POST /api/v1/deid (single) · POST /api/v1/deid/file (batch) · Celery job queue endpoints for long-running evaluation.X-API-Key header, SlowAPI rate-limiter (30/min), 1 MB request body cap, configurable MAX_TEXT_SIZE.