How workflows work
Execution model, reliability, and operational guarantees.
Lightfield workflows run on a durable execution engine designed to process millions of events daily. This page covers the architecture at a high level for teams evaluating Lightfield’s reliability and operational guarantees.
Durable execution
Section titled “Durable execution”Workflow steps can take a long time. An AI agent reasoning over a complex payload might run for 60 seconds. An HTTP request to a slow external API might take 30. Lightfield’s execution engine is designed around this reality.
Slow work (external calls, AI execution) is separated from fast work (state updates, scheduling the next step). The engine stays responsive regardless of how long individual steps take, and no step blocks another from making progress.
Each step’s side effects are idempotent. If a step is retried after a transient failure, it produces the same result without duplicating records or sending duplicate requests.
Event processing
Section titled “Event processing”When a trigger fires (webhook received, record changed, schedule ticks), the event is captured atomically. Events are never lost, even if the system crashes immediately after the trigger.
A sharded background processor dispatches events to workflow executions. Sharding ensures that a single high-volume workflow can’t starve others. Each event is processed with a scoped idempotency key, so replaying events after a failure never creates duplicate workflow runs.
For lifecycle triggers, event matching is precise. If a trigger watches the stage and amount fields on an opportunity, and an update only changes the description field, the trigger doesn’t fire. This filtering happens before a workflow ever starts executing.
Reliability guarantees
Section titled “Reliability guarantees”| Guarantee | How |
|---|---|
| Events are never lost | Trigger events are captured atomically with the originating change |
| No duplicate executions | Scoped idempotency keys on every event and every write |
| Safe concurrent execution | Compare-and-swap concurrency control prevents state corruption |
| Safe to edit active workflows | Immutable version snapshots; running executions are pinned to their start version |
| Automatic retries | Transient failures (network errors, 502/503/504) retry with exponential backoff |
Error handling
Section titled “Error handling”Errors are classified as permanent or retryable. Transient failures (network timeouts, HTTP 502/503/504) are retried automatically with backoff. Permanent failures (invalid configuration, template resolution errors) fail the step immediately and skip subsequent steps.
All errors carry structured, namespaced error codes and metadata, making them queryable and debuggable without parsing log messages.
Observability
Section titled “Observability”Every workflow run carries a trace ID that correlates across the full execution lifecycle. Step-level events (started, completed, failed, skipped) are recorded with timestamps and metadata, giving you a complete timeline of every execution.
For practical workflow building, see Building workflows. For real-world examples, see Workflow recipes.