Worka PII
Worka PII is a Rust-first library for detecting and anonymizing personally identifiable information (PII). It is designed for deterministic output, capability-aware NLP, and audit-friendly redaction so teams can safely run AI workflows in CPU-only environments.
This documentation covers the concepts, architecture, and APIs that make Worka PII predictable and composable in production systems. Use it when you need stable byte offsets, explicit policies, and a pipeline that degrades gracefully when language features are unavailable.
What it provides
- Deterministic detection with stable byte offsets.
- A modular pipeline of recognizers, validators, and optional NER.
- Policy-driven anonymization with explicit operators per entity type.
- An audit-friendly output model that preserves the original spans.
How it fits into Worka
Worka uses PII to sanitize prompts, tool inputs, and stored artifacts before they reach external systems. The same deterministic spans are also used for event logs and audit trails so redaction is reproducible.
Quick start
use pii::anonymize::{AnonymizeConfig, Anonymizer, Operator};
use pii::nlp::SimpleNlpEngine;
use pii::presets::default_recognizers;
use pii::{Analyzer, PolicyConfig};
use pii::types::Language;
use std::collections::HashMap;
let analyzer = Analyzer::new(
Box::new(SimpleNlpEngine::default()),
default_recognizers(),
Vec::new(),
PolicyConfig::default(),
);
let text = "Email jane@example.com or call +1 415-555-1212.";
let result = analyzer.analyze(text, &Language::from("en")).unwrap();
let mut config = AnonymizeConfig::default();
let mut per_entity = HashMap::new();
per_entity.insert("Email".to_string(), Operator::Replace { with: "<EMAIL>".into() });
per_entity.insert("Phone".to_string(), Operator::Mask { ch: '*', from_end: 4 });
config.per_entity = per_entity;
let redacted = Anonymizer::anonymize(text, &result.entities, &config).unwrap();
println!("{}", redacted.text);
Where to go next
- Learn the pipeline and entity model in Fundamentals.
- Review the deterministic offset and audit rules in Architecture.
- Use the API reference to build custom recognizers or policies.
- See real-world patterns in Scenarios.