Skip to main content

Troubleshooting and Incidents

When something breaks in Worka, start by identifying which layer is actually failing.

Most incidents fall into one of these groups:

  • user access or authorization
  • missing or broken connection state
  • blocked approvals
  • pack invocation failure
  • service deployment or health failure
  • publication or attachment failure
  • stale or incorrect client projection

If you mix those together, you burn time looking in the wrong place.

Start with the user-visible symptom

Ask:

  • what did the user try to do
  • where in the product did it fail
  • did the failure happen before work started, while work was running, or after work supposedly completed

Those three timings already tell you a lot.

Common patterns

The workspace is waiting forever

Check:

  • whether an approval is pending
  • whether the task is deferred or retrying
  • whether a required connection or pack attachment is missing
  • whether the client is showing stale projection state

A pack exists but tool calls fail

Check:

  • whether the pack is attached to the workspace
  • whether the release is actually installable and active
  • whether the tool names discovered at registration match the tool names being invoked
  • whether the broker rejected the call because of capability, connection, or outbound policy

A shared view is available but wrong

Check:

  • whether the wrong audience is attached
  • whether a public/private toggle was changed recently
  • whether the underlying workflow or service is healthy
  • whether the data model the view depends on is actually being populated

A service exists but is not healthy

Check:

  • deployment state
  • recent logs
  • health check result
  • dependency and secret state
  • whether the route is valid and the service is reachable at its assigned domain

During an incident

Work in this order:

  1. stabilise the affected capability
  2. confirm scope: one user, one workspace, one tenant, or platform-wide
  3. identify the failing layer
  4. capture the relevant IDs, timestamps, and actor names
  5. mitigate or roll back
  6. only then begin deeper diagnosis

If you start with forensic detail before containment, you risk turning a narrow incident into a wider outage.