Skip to main content

Troubleshooting and Incidents

When something breaks in Worka, start by identifying which layer is actually failing.

Most incidents fall into one of these groups:

user access or authorization
missing or broken connection state
blocked approvals
pack invocation failure
service deployment or health failure
publication or attachment failure
stale or incorrect client projection

If you mix those together, you burn time looking in the wrong place.

Start with the user-visible symptom

Ask:

what did the user try to do
where in the product did it fail
did the failure happen before work started, while work was running, or after work supposedly completed

Those three timings already tell you a lot.

Common patterns

The workspace is waiting forever

Check:

whether an approval is pending
whether the task is deferred or retrying
whether a required connection or pack attachment is missing
whether the client is showing stale projection state

A pack exists but tool calls fail

Check:

whether the pack is attached to the workspace
whether the release is actually installable and active
whether the tool names discovered at registration match the tool names being invoked
whether the broker rejected the call because of capability, connection, or outbound policy

A shared view is available but wrong

Check:

whether the wrong audience is attached
whether a public/private toggle was changed recently
whether the underlying workflow or service is healthy
whether the data model the view depends on is actually being populated

A service exists but is not healthy

Check:

deployment state
recent logs
health check result
dependency and secret state
whether the route is valid and the service is reachable at its assigned domain

During an incident

Work in this order:

stabilise the affected capability
confirm scope: one user, one workspace, one tenant, or platform-wide
identify the failing layer
capture the relevant IDs, timestamps, and actor names
mitigate or roll back
only then begin deeper diagnosis

If you start with forensic detail before containment, you risk turning a narrow incident into a wider outage.

Start with the user-visible symptom
Common patterns
During an incident