Local, offline PII (Personally Identifiable Information) detection using Microsoft Presidio and spaCy.
- Local-only processing: No internet access required, no cloud API calls
- Offline models: Uses pre-downloaded spaCy model (
en_core_web_md) - No auto-downloads: Explicitly disables transformers, Stanza, and Hugging Face Hub
- Custom recognition: Includes organization detection via spaCy NER
- Privacy-focused: All data stays on your machine
- Python 3.8 or later
pip(Python package manager)
setup_ner.bat-
Install Python packages:
pip install -r requirements.txt
-
Download spaCy model:
python -m spacy download en_core_web_md
-
Verify installation:
python check_spacy_model.py
from presidio_detector import detect_pii
result = detect_pii("My name is John Doe and my email is john@example.com")
print(result)echo '{"text": "Call me at 555-1234 or email test@example.com"}' | python presidio_detector.py- presidio_detector.py: Main PII detection engine
- check_spacy_model.py: Validation utility for spaCy model installation
- requirements.txt: Python dependencies (Presidio, spaCy)
- setup_ner.bat: Windows setup script (optional)
- setup_ner.ps1: PowerShell setup script (optional)
Environment variables (automatically set in code):
HF_DATASETS_OFFLINE=1- Disables Hugging Face data auto-downloadTRANSFORMERS_OFFLINE=1- Disables transformers auto-downloadHF_HUB_OFFLINE=1- Disables Hugging Face Hub access
- PERSON
- EMAIL_ADDRESS
- PHONE_NUMBER
- CREDIT_CARD
- ORGANIZATION
- DATE_TIME
- And other PII categories
Run the setup script:
setup_ner.batOr manually install:
pip install -r requirements.txt
python -m spacy download en_core_web_mdVerify installation:
python check_spacy_model.pyIf it fails, reinstall:
python -m spacy download en_core_web_md