Determinism and Auditability
Worka PII is built for reproducible redaction. Determinism is enforced in three ways:
- Stable byte offsets are computed directly from the original UTF-8 input.
- Candidate ordering is stable and does not depend on hash maps or runtime iteration order.
- Overlap resolution uses explicit scoring rules with deterministic tie-breakers.
Stable offsets
Offsets are byte offsets into the original input. This guarantees that a detection refers to the same substring regardless of downstream transforms, making it safe to audit and re-apply redaction.
Audit log
The anonymizer returns both the redacted text and a list of items containing:
- the original entity span,
- the operator applied,
- the replacement value.
This makes it easy to log or validate redaction decisions without re-running detection.