Aegis DeID
PHI / PII Redaction Studio · v0.2.0
No. 01 Privacy Studio Policy Governed

Redact protected health & personally identifiable information
with a policy-governed, fully auditable redaction engine.

A FastAPI service that detects 20+ PHI and PII categories — SSN, SIN, NPI, DEA, MRN, HICN, provincial health cards, street addresses, ZIP, postal codes, credit cards, IBAN and routing numbers — and applies per-label mask / hash / redact policies with spaCy NER, regex rules and deterministic span resolution.

20+
Identifier categories
2
Detection layers
< 30ms
Median latency / 4KB
3
Policy actions
clinical-note.txt · redacted ✓ 9 entities · 12 ms
Source
Patient: Eleanor R. Whitfield MRN: BWH-47882910 DOB: 03/14/1968 SSN: 412-55-7891 Phone: (617) 555-0134 NPI: 1538296472 Visit: 04/02/2026 — chest discomfort.
Redacted
Patient: [REDACTED:PERSON] MRN: MRN_HASH:4f2a…c91 DOB: ********** SSN: SSN_HASH:7d1c…88e Phone: ************** NPI: NPI_HASH:2b03…47f Visit: ********** — chest discomfort.

02 Redaction Desk

Load sample →
0 chars
Drop a .txt file here or click Upload in the toolbar
Source
Ctrl
Redacted
Redacted output will appear here once you click Redact.
Hash (SSN, NPI, MRN, HICN…) Mask (Phone, Credit Card, IBAN, Date…) Redact (Person, Address, ZIP, URL, IP…)

03 Detected Entities · Audit Ledger

0
# Label Matched Value Action Span
No entities detected. Paste text and press Redact.
Signed · aegis-deid engine
Pattern Ladder

What the engine sees.

Every detector is prioritized. Structured identifiers — SSN, NPI, credit card, MRN — outrank generic NER spans on overlap so a patient ID never gets silently merged into a person name. All patterns live in app/deid/regex_rules.py.

SSN
Social Security
AAA-GG-SSSS with 000/666/9xx prefix guards. Hashed, never stored plain.
Hash
SIN
Social Insurance
9-digit social insurance grouped 3-3-3 with space or hyphen separators.
Hash
NPI
Provider ID
10-digit National Provider Identifier, labeled or in-context with "provider".
Hash
DEA
DEA Number
Controlled-substance prescriber ID: 2 letters + 7 digits.
Hash
MRN
Medical Record #
Hospital-prefixed record IDs (e.g. BWH-47882910) with context anchoring.
Hash
HICN / MBI
Medicare Claim
Medicare beneficiary identifiers in the newer MBI layout.
Hash
HEALTH_CARD
Provincial Health
Provincial health card numbers (e.g. 4532-281-947-AB).
Hash
PASSPORT
Passport
1 letter + 8 digits or 9 digits with context anchoring.
Hash
CREDIT_CARD
Payment Card
13–19 digit PAN with hyphen/space grouping. Masked by default.
Mask
ABA
Routing Number
9-digit ABA routing numbers, labeled as "ABA" or "routing".
Mask
IBAN
International Bank
Country prefix + check digits + BBAN — wire-instruction traces.
Mask
PHONE
NANP Phone
All North American Numbering Plan formats: (xxx) xxx-xxxx, +1 xxx, dotted…
Mask
DATE
Clinical Date
MM/DD/YYYY, YYYY-MM-DD — admission, DOB, visit timestamps.
Mask
EMAIL
Email Address
RFC-lite recognizer hashed to preserve join keys across datasets.
Hash
STREET
Street Address
Number + name + Street/Ave/Blvd/Pkwy suffix with directional prefix.
Redact
ZIP / POSTAL
Postal Codes
ZIP 5 or 9 digit, alphanumeric A1A 1A1 postal code.
Redact
PERSON
Name (NER)
spaCy en_core_web_sm — first/last name detection, redacted by default.
Redact
GPE · LOC
Places (NER)
Cities, states, countries and geopolitical entities from NER.
Redact
IP
IPv4
Dotted-quad IPs in audit trails and access logs.
Redact
URL
Web URL
HTTP(S) URLs — portal sessions, cloud dashboards, file links.
Redact
Policy Actions

Three modes, per label.

Each category maps to exactly one action in a POLICY_MAP dict. Mask replaces every character with * to preserve column width. Hash emits a salted SHA-256 so records can still be joined or counted. Redact swaps the value for a [REDACTED:LABEL] tag when the span should vanish entirely.

Label Input Action Output
SSN412-55-7891hashSSN_HASH:7d1c45…a23988e
NPINPI# 1538296472hashNPI_HASH:2b039f…89c47f
MRNBWH-47882910hashMRN_HASH:4f2a11…7ffc91
CREDIT_CARD4532 8827 1104 9951mask*******************
PHONE_US(617) 555-0134mask**************
DATE04/02/2026mask**********
PERSONEleanor R. Whitfieldredact[REDACTED:PERSON]
US_STREET482 Commonwealth Avenueredact[REDACTED:ADDRESS]
IP72.14.212.85redact[REDACTED:IP]
Stack

Under the hood.

FastAPI service with a Pydantic-validated transport, a SlowAPI rate limiter and a 1 MB body cap. Detection runs in-process; long-running evaluation jobs hand off to Celery over Redis. Metrics persist in Postgres for auditability.

# ─────────────────────────────────────────────────────────────
# Request path
# ─────────────────────────────────────────────────────────────
   client                    fastapi                          deid engine
     │                          │                                 │
     │  POST /api/v1/deid       │                                 │
     ├─────────────────────────▶│  require_api_key · rate_limit   │
     │                          │  max_body 1MB · max_text_size   │
     │                          ├────────────────────────────────▶│
     │                          │                                 │  spaCy NER
     │                          │                                 │  en_core_web_sm
     │                          │                                 │   │
     │                          │                                 │   ▼
     │                          │                                 │  regex ladder
     │                          │                                 │  PATTERNS[] sorted
     │                          │                                 │  by priority
     │                          │                                 │   │
     │                          │                                 │   ▼
     │                          │                                 │  POLICY_MAP
     │                          │                                 │  SSN→hash  PHONE→mask
     │                          │                                 │  NAME→redact  DATE→mask
     │                          │                                 │   │
     │  { result_text,          │  JSONResponse                   │   ▼
     │◀─  entities, time_ms } ◀─┤◀────────────────────────────────┤  span resolution
     │                          │                                 │  (priority · length · start)

# ─────────────────────────────────────────────────────────────
# Async jobs · long evaluations
# ─────────────────────────────────────────────────────────────
   fastapi ──enqueue──▶ celery ──▶ redis broker ──▶ worker ──▶ postgres MetricRun

07 Technical Spec

Everything under the hood
Detection
spaCy en_core_web_sm NER + prioritized regex rules (PATTERNS ladder). Overlaps resolved by priority / length / start.
Policies
mask replaces every char with *; redact inserts [REDACTED:LABEL]; hash emits LABEL_HASH:<SHA-256(salt‖value)>.
Gov't & healthcare IDs
SSN · SIN · NPI · DEA · Passport · MRN · HICN / MBI · Provincial health card · MM/DD/YYYY + ISO dates
Financial & location
Credit card · ABA routing · IBAN · Street address · ZIP / ZIP+4 · Postal code A1A 1A1
Transport
POST /api/v1/deid (single) · POST /api/v1/deid/file (batch) · Celery job queue endpoints for long-running evaluation.
Auth & limits
X-API-Key header, SlowAPI rate-limiter (30/min), 1 MB request body cap, configurable MAX_TEXT_SIZE.