Skip to content

How workflows work

Execution model, reliability, and operational guarantees.

Lightfield workflows run on a durable execution engine designed to process millions of events daily. This page covers the architecture at a high level for teams evaluating Lightfield’s reliability and operational guarantees.

Workflow steps can take a long time. An AI agent reasoning over a complex payload might run for 60 seconds. An HTTP request to a slow external API might take 30. Lightfield’s execution engine is designed around this reality.

Slow work (external calls, AI execution) is separated from fast work (state updates, scheduling the next step). The engine stays responsive regardless of how long individual steps take, and no step blocks another from making progress.

Each step’s side effects are idempotent. If a step is retried after a transient failure, it produces the same result without duplicating records or sending duplicate requests.

When a trigger fires (webhook received, record changed, schedule ticks), the event is captured atomically. Events are never lost, even if the system crashes immediately after the trigger.

A sharded background processor dispatches events to workflow executions. Sharding ensures that a single high-volume workflow can’t starve others. Each event is processed with a scoped idempotency key, so replaying events after a failure never creates duplicate workflow runs.

For lifecycle triggers, event matching is precise. If a trigger watches the stage and amount fields on an opportunity, and an update only changes the description field, the trigger doesn’t fire. This filtering happens before a workflow ever starts executing.

GuaranteeHow
Events are never lostTrigger events are captured atomically with the originating change
No duplicate executionsScoped idempotency keys on every event and every write
Safe concurrent executionCompare-and-swap concurrency control prevents state corruption
Safe to edit active workflowsImmutable version snapshots; running executions are pinned to their start version
Automatic retriesTransient failures (network errors, 502/503/504) retry with exponential backoff

Errors are classified as permanent or retryable. Transient failures (network timeouts, HTTP 502/503/504) are retried automatically with backoff. Permanent failures (invalid configuration, template resolution errors) fail the step immediately and skip subsequent steps.

All errors carry structured, namespaced error codes and metadata, making them queryable and debuggable without parsing log messages.

Every workflow run carries a trace ID that correlates across the full execution lifecycle. Step-level events (started, completed, failed, skipped) are recorded with timestamps and metadata, giving you a complete timeline of every execution.


For practical workflow building, see Building workflows. For real-world examples, see Workflow recipes.