Pipeline Architecture
Worka PII builds a deterministic pipeline from three layers: NLP artifacts, recognizers, and a decision engine. Each layer is explicit and testable.
Input Text
-> NLP Engine (tokens, offsets, optional lemma/POS/NER)
-> Recognizers (regex, validator, dictionary, NER)
-> Context Enhancers (optional score boosts)
-> Decision Engine (thresholding + overlap resolution)
-> Detections
-> Anonymizer (operators per entity)
-> Redacted Output + Audit Items
Capability-aware processing
Not every language provides the same NLP features. Worka PII defines capabilities for tokenization, lemma, POS, and NER. If a capability is missing, the pipeline degrades predictably instead of failing. For example, recognizers that rely on NER do not run if NER is unavailable, while regex recognizers still execute.
Deterministic ordering
Candidates are normalized into a stable order before resolution. The decision engine applies a fixed set of tie-breaking rules so the same input yields the same output even when multiple recognizers overlap.