Skip to main content

Determinism and Auditability

Worka PII is built for reproducible redaction. Determinism is enforced in three ways:

  1. Stable byte offsets are computed directly from the original UTF-8 input.
  2. Candidate ordering is stable and does not depend on hash maps or runtime iteration order.
  3. Overlap resolution uses explicit scoring rules with deterministic tie-breakers.

Stable offsets

Offsets are byte offsets into the original input. This guarantees that a detection refers to the same substring regardless of downstream transforms, making it safe to audit and re-apply redaction.

Audit log

The anonymizer returns both the redacted text and a list of items containing:

  • the original entity span,
  • the operator applied,
  • the replacement value.

This makes it easy to log or validate redaction decisions without re-running detection.