OpenAI releases Privacy Filter for PII detection and redaction
OpenAI has released Privacy Filter, an open-weight model for detecting and redacting personally identifiable information (PII) in text. The model is designed to run locally, meaning sensitive data can be redacted without leaving your machine.
Privacy Filter is a 1.5B parameter bidirectional token-classification model that can detect eight categories of private information:
- private_person – Personal names and identifiers
- private_address – Physical addresses
- private_email – Email addresses
- private_phone – Phone numbers
- private_url – Personal URLs
- private_date – Private dates (birthdays, etc.)
- account_number – Banking/credit card numbers
- secret – Passwords, API keys, and other secrets
The model achieves 96% F1 score on the PII-Masking-300k benchmark and supports context windows up to 128,000 tokens, allowing it to process long documents in a single pass.
What makes this particularly interesting is that it’s something I want to try in the future for privacy-preserving data processing workflows. Running locally means you can redact sensitive information before sending data to external services or storing it in logs.
The software is available under Apache 2.0 license on GitHub and Hugging Face, making it easy to integrate into existing pipelines.