Skip to main content

Pipeline Architecture

Worka PII builds a deterministic pipeline from three layers: NLP artifacts, recognizers, and a decision engine. Each layer is explicit and testable.

Input Text
-> NLP Engine (tokens, offsets, optional lemma/POS/NER)
-> Recognizers (regex, validator, dictionary, NER)
-> Context Enhancers (optional score boosts)
-> Decision Engine (thresholding + overlap resolution)
-> Detections
-> Anonymizer (operators per entity)
-> Redacted Output + Audit Items

Capability-aware processing

Not every language provides the same NLP features. Worka PII defines capabilities for tokenization, lemma, POS, and NER. If a capability is missing, the pipeline degrades predictably instead of failing. For example, recognizers that rely on NER do not run if NER is unavailable, while regex recognizers still execute.

Deterministic ordering

Candidates are normalized into a stable order before resolution. The decision engine applies a fixed set of tie-breaking rules so the same input yields the same output even when multiple recognizers overlap.