Troubleshooting and Incidents
When something breaks in Worka, start by identifying which layer is actually failing.
Most incidents fall into one of these groups:
- user access or authorization
- missing or broken connection state
- blocked approvals
- pack invocation failure
- service deployment or health failure
- publication or attachment failure
- stale or incorrect client projection
If you mix those together, you burn time looking in the wrong place.
Start with the user-visible symptom
Ask:
- what did the user try to do
- where in the product did it fail
- did the failure happen before work started, while work was running, or after work supposedly completed
Those three timings already tell you a lot.
Common patterns
The workspace is waiting forever
Check:
- whether an approval is pending
- whether the task is deferred or retrying
- whether a required connection or pack attachment is missing
- whether the client is showing stale projection state
A pack exists but tool calls fail
Check:
- whether the pack is attached to the workspace
- whether the release is actually installable and active
- whether the tool names discovered at registration match the tool names being invoked
- whether the broker rejected the call because of capability, connection, or outbound policy
A shared view is available but wrong
Check:
- whether the wrong audience is attached
- whether a public/private toggle was changed recently
- whether the underlying workflow or service is healthy
- whether the data model the view depends on is actually being populated
A service exists but is not healthy
Check:
- deployment state
- recent logs
- health check result
- dependency and secret state
- whether the route is valid and the service is reachable at its assigned domain
During an incident
Work in this order:
- stabilise the affected capability
- confirm scope: one user, one workspace, one tenant, or platform-wide
- identify the failing layer
- capture the relevant IDs, timestamps, and actor names
- mitigate or roll back
- only then begin deeper diagnosis
If you start with forensic detail before containment, you risk turning a narrow incident into a wider outage.