How workflows work

Workflows

Execution model, reliability, and operational guarantees.

Lightfield workflows run on a durable execution engine designed to process millions of events daily. This page covers the architecture at a high level for teams evaluating Lightfield’s reliability and operational guarantees.

Durable execution

Workflow steps can take a long time. An AI agent reasoning over a complex payload might run for 60 seconds. An HTTP request to a slow external API might take 30. Lightfield’s execution engine is designed around this reality.

Slow work (external calls, AI execution) is separated from fast work (state updates, scheduling the next step). The engine stays responsive regardless of how long individual steps take, and no step blocks another from making progress.

Each step’s side effects are idempotent. If a step is retried after a transient failure, it produces the same result without duplicating records or sending duplicate requests.

Event processing

When a trigger fires (webhook received, record changed, schedule ticks), the event is captured atomically. Events are never lost, even if the system crashes immediately after the trigger.

A sharded background processor dispatches events to workflow executions. Sharding ensures that a single high-volume workflow can’t starve others. Each event is processed with a scoped idempotency key, so replaying events after a failure never creates duplicate workflow runs.

For lifecycle triggers, event matching is precise. If a trigger watches the stage and amount fields on an opportunity, and an update only changes the description field, the trigger doesn’t fire. This filtering happens before a workflow ever starts executing.

Reliability guarantees

Guarantee	How
Events are never lost	Trigger events are captured atomically with the originating change
No duplicate executions	Scoped idempotency keys on every event and every write
Safe concurrent execution	Compare-and-swap concurrency control prevents state corruption
Safe to edit active workflows	Immutable version snapshots; running executions are pinned to their start version
Automatic retries	Transient failures (network errors, 502/503/504) retry with exponential backoff

Error handling

Errors are classified as permanent or retryable. Transient failures (network timeouts, HTTP 502/503/504) are retried automatically with backoff. Permanent failures (invalid configuration, template resolution errors) fail the step immediately and skip subsequent steps.

All errors carry structured, namespaced error codes and metadata, making them queryable and debuggable without parsing log messages.

Observability

Every workflow run carries a trace ID that correlates across the full execution lifecycle. Step-level events (started, completed, failed, skipped) are recorded with timestamps and metadata, giving you a complete timeline of every execution.

For practical workflow building, see Building workflows. For real-world examples, see Workflow recipes.