Skip to content

Add CreateCallbackAsync and WaitForCallbackAsync (DOTNET-8660)#2373

Merged
GarrettBeatty merged 4 commits into
feature/durablefunctionfrom
gcbeatty/durable-callbacks
May 28, 2026
Merged

Add CreateCallbackAsync and WaitForCallbackAsync (DOTNET-8660)#2373
GarrettBeatty merged 4 commits into
feature/durablefunctionfrom
gcbeatty/durable-callbacks

Conversation

@GarrettBeatty
Copy link
Copy Markdown
Contributor

@GarrettBeatty GarrettBeatty commented May 14, 2026

#2216

Fixes DOTNET-8660.

What

Adds callback support to Amazon.Lambda.DurableExecution. A workflow can now hand a service-allocated CallbackId to an external system (a queue consumer, a human-approval UI, a long-running job runner) and suspend until that system reports back via the durable execution service. Two entry points: a low-level handle (CreateCallbackAsync) and the common "submit + wait" composition (WaitForCallbackAsync).

Public API:

Type Purpose
IDurableContext.CreateCallbackAsync<T>(...) Allocate a callback; returns an ICallback<T> handle. Errors are deferred to GetResultAsync so user code between create and await runs deterministically across replays.
IDurableContext.WaitForCallbackAsync<T>(...) Composite: CreateCallback + submitter step + GetResultAsync inside a child context. Common path for "submit job, wait for completion".
ICallback<T> Handle exposing CallbackId (give to the external system) and GetResultAsync (suspends until completion).
IWaitForCallbackContext Logger-only context passed to the submitter delegate. Distinct from IStepContext so the submitter API can evolve independently.
CallbackConfig Timeout + HeartbeatTimeout. Sub-second positive values are rejected (service timer granularity is 1s); TimeSpan.Zero disables.
WaitForCallbackConfig : CallbackConfig Adds RetryStrategy for the submitter step.
CallbackException (base) + CallbackFailedException, CallbackTimeoutException, CallbackSubmitterException Subclass tree so catch clauses can pattern-match the failure mode. Carries CallbackId, ErrorType, ErrorData, OriginalStackTrace.

Both APIs read the ILambdaSerializer from ILambdaContext.Serializer (typically registered via LambdaBootstrapBuilder.Create(handler, serializer)) and throw InvalidOperationException if no serializer is registered. AOT and reflection-based scenarios share a single overload — the AOT story is determined entirely by the registered serializer (e.g., SourceGeneratorLambdaJsonSerializer<TContext> for AOT).

How

Internal/CallbackOperation<T> mirrors the Step/Wait pattern from #2360 and the child-context pattern from #2370:

  • Fresh execution. Synchronously flushes a CALLBACK START checkpoint. The service stamps a freshly-allocated CallbackId onto the response; LambdaDurableServiceClient gains an onNewOperations hook so that ID flows back into ExecutionState during the START flush, where the operation can read it. The handle is returned immediately — CreateCallbackAsync always succeeds.
  • GetResultAsync suspends. On the invocation that first reaches the await, the workflow hits Termination.SuspendAndAwait and Lambda exits. When the external system delivers a result, the service re-invokes; replay observes the terminal checkpoint and returns (or throws) immediately.
  • Replay. SUCCEEDED returns the cached value (deserialized via the registered ILambdaSerializer). FAILED throws CallbackFailedException. TIMED_OUT throws CallbackTimeoutException. STARTED / PENDING re-suspend (external system hasn't responded yet). Any other status throws NonDeterministicExecutionException.
  • Deferred error propagation. Terminal status observed during Start/Replay is stashed on _terminalReplay and only resolved inside GetResultAsync. This keeps CreateCallbackAsync deterministically successful, so user code between create and await sees the same control flow on fresh execution and replay.
  • WaitForCallbackAsync composes RunInChildContextAsync (from Add RunInChildContextAsync #2370) + CreateCallbackAsync + a submitter StepAsync + GetResultAsync. The child-context wrapper gives a clean observability boundary (SubType = WaitForCallback) and a single error-mapping site: submitter step failures surface as CallbackSubmitterException; callback failures/timeouts preserve their subclass through child-context replay (a CallbackTimeoutException thrown inside the child remains a CallbackTimeoutException after the parent CONTEXT-FAILED replay).

Testing

42 new unit tests across CallbackOperationTests, WaitForCallbackTests, DurableFunctionTests, and ExceptionsTests:

  • Fresh execution + sync-flush of CALLBACK START (CallbackId stamped onto state).
  • Suspend on first GetResultAsync; replay returns cached value without re-running.
  • Terminal-state replay: SUCCEEDED deserializes, FAILED throws CallbackFailedException, TIMED_OUT throws CallbackTimeoutException.
  • STARTED/PENDING replay re-suspends; unknown status throws NonDeterministicExecutionException.
  • CreateCallbackAsync is always successful even for terminal-state replays (deferred error propagation).
  • CallbackConfig.Timeout / HeartbeatTimeout validation: rejects negative and sub-second positive values, accepts TimeSpan.Zero.
  • WaitForCallbackAsync: submitter receives the CallbackId; submitter failure (after retries exhausted) surfaces as CallbackSubmitterException; callback failure/timeout subclass survives parent CONTEXT-FAILED replay; happy-path returns the deserialized result.
  • Exception type hierarchy + property serialization round-trip.

5 new integration tests (require AWS credentials to run):
CreateCallbackHappyPath, CallbackTimeout, CallbackFailed, WaitForCallbackHappyPath, WaitForCallbackSubmitterFails. Each ships a deployable test function + Dockerfile under IntegrationTests/TestFunctions/.

203/203 unit tests pass on net8.0 and net10.0 (161 base + 42 new). Production build clean: 0 warnings, TreatWarningsAsErrors enforced.

Out of scope (follow-up PRs)

  • InvokeAsync / MapAsync / ParallelAsync / WaitForConditionAsync
  • DefaultJsonCheckpointSerializer
  • DurableLogger replay-suppression (currently NullLogger)
  • Annotations source-generator integration / [DurableExecution] attribute
  • DurableTestRunner / Amazon.Lambda.DurableExecution.Testing package
  • dotnet new lambda.DurableFunction blueprint


COPY bin/publish/ ${LAMBDA_TASK_ROOT}

ENTRYPOINT ["/var/task/bootstrap"]

COPY bin/publish/ ${LAMBDA_TASK_ROOT}

ENTRYPOINT ["/var/task/bootstrap"]

COPY bin/publish/ ${LAMBDA_TASK_ROOT}

ENTRYPOINT ["/var/task/bootstrap"]

COPY bin/publish/ ${LAMBDA_TASK_ROOT}

ENTRYPOINT ["/var/task/bootstrap"]

COPY bin/publish/ ${LAMBDA_TASK_ROOT}

ENTRYPOINT ["/var/task/bootstrap"]
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-wave0 branch from 464c591 to d308c3b Compare May 14, 2026 21:49
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-callbacks branch 2 times, most recently from 951fcd1 to 1c88461 Compare May 14, 2026 22:19
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-wave0 branch from d308c3b to be4c3ad Compare May 18, 2026 15:23
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-callbacks branch from 1c88461 to 5cc9a04 Compare May 18, 2026 15:46
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-wave0 branch 3 times, most recently from ad4d208 to 3acbed5 Compare May 20, 2026 17:46
Base automatically changed from gcbeatty/durable-wave0 to gcbeatty/durable-child-context May 20, 2026 17:46
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-callbacks branch 3 times, most recently from 0d5a1f9 to fc5dbbd Compare May 20, 2026 18:12

COPY bin/publish/ ${LAMBDA_TASK_ROOT}

ENTRYPOINT ["/var/task/bootstrap"]
@GarrettBeatty GarrettBeatty added the Release Not Needed Add this label if a PR does not need to be released. label May 20, 2026
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-callbacks branch 2 times, most recently from f59dba9 to efe77ee Compare May 20, 2026 18:44
@GarrettBeatty GarrettBeatty requested a review from Copilot May 20, 2026 18:50
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class callback support to the Amazon.Lambda.DurableExecution .NET SDK, enabling workflows to pause until an external system completes a callback, and providing a convenience “submit + wait” composite API.

Changes:

  • Introduces CreateCallbackAsync<T> / ICallback<T> for durable callback handles and result retrieval.
  • Adds WaitForCallbackAsync<T> with WaitForCallbackConfig + IWaitForCallbackContext, including retry wiring and error remapping.
  • Extends checkpoint plumbing to merge service-returned NewExecutionState.Operations into in-memory ExecutionState, plus broad unit/integration test coverage and design-doc updates.

Reviewed changes

Copilot reviewed 43 out of 43 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
Libraries/test/Amazon.Lambda.DurableExecution.Tests/WaitForCallbackTests.cs New unit tests covering WaitForCallbackAsync behavior, naming, determinism, and exception mapping.
Libraries/test/Amazon.Lambda.DurableExecution.Tests/RecordingBatcher.cs Adds a flush hook used by tests to simulate service-side state updates (e.g., CallbackId allocation).
Libraries/test/Amazon.Lambda.DurableExecution.Tests/MockLambdaClient.cs Adds a customizable checkpoint response handler for tests modeling NewExecutionState behavior.
Libraries/test/Amazon.Lambda.DurableExecution.Tests/ExceptionsTests.cs Adds ctor/property tests for new callback exception types.
Libraries/test/Amazon.Lambda.DurableExecution.Tests/DurableFunctionTests.cs Adds end-to-end unit tests through DurableFunction.WrapAsync for callback allocation/replay determinism.
Libraries/test/Amazon.Lambda.DurableExecution.Tests/CallbackOperationTests.cs New unit tests covering callback operation start/replay/result/error behavior and serializer requirements.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/WaitForCallbackSubmitterFailsTest.cs New integration test validating submitter failure surfaces as CallbackSubmitterException.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/WaitForCallbackHappyPathTest.cs New integration test validating two-Lambda “external system” happy path for WaitForCallbackAsync.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/WaitForCallbackSubmitterFailsFunction/WaitForCallbackSubmitterFailsFunction.csproj New test function project used by integration tests.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/WaitForCallbackSubmitterFailsFunction/Function.cs Workflow test function that intentionally fails the submitter.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/WaitForCallbackSubmitterFailsFunction/Dockerfile Container packaging for the submitter-failure integration test function.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/WaitForCallbackHappyPathFunction/WaitForCallbackHappyPathFunction.csproj New test function project for happy-path WaitForCallback integration.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/WaitForCallbackHappyPathFunction/Function.cs Workflow test function that invokes an external approver Lambda and waits for callback completion.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/WaitForCallbackHappyPathFunction/Dockerfile Container packaging for the happy-path workflow function.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/CreateCallbackHappyPathFunction/Function.cs Workflow test function that uses CreateCallbackAsync and suspends on GetResultAsync.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/CreateCallbackHappyPathFunction/Dockerfile Container packaging for CreateCallback happy-path integration test function.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/CreateCallbackHappyPathFunction/CreateCallbackHappyPathFunction.csproj New CreateCallback integration test function project.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/CallbackTimeoutFunction/Function.cs Workflow test function validating callback timeouts via CallbackConfig.Timeout.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/CallbackTimeoutFunction/Dockerfile Container packaging for callback-timeout integration test function.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/CallbackTimeoutFunction/CallbackTimeoutFunction.csproj New callback-timeout integration test function project.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/CallbackFailedFunction/Function.cs Workflow test function validating callback failure delivery via SendDurableExecutionCallbackFailure.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/CallbackFailedFunction/Dockerfile Container packaging for callback-failure integration test function.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/CallbackFailedFunction/CallbackFailedFunction.csproj New callback-failure integration test function project.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ApproverFunction/Function.cs External “approver” Lambda that completes callbacks via SendDurableExecutionCallbackSuccess.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ApproverFunction/Dockerfile Container packaging for external approver function.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ApproverFunction/ApproverFunction.csproj New external approver function project (includes AWSSDK.Lambda).
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/DurableFunctionDeployment.cs Enhances deployment helper to optionally deploy a paired external Lambda; adds cross-process build locking.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/CreateCallbackHappyPathTest.cs New integration test for CreateCallback success delivery.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/CallbackTimeoutTest.cs New integration test for callback timeout surface/type recording.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/CallbackFailedTest.cs New integration test for callback failure surface/type recording.
Libraries/src/Amazon.Lambda.DurableExecution/WaitForCallbackConfig.cs Adds configuration type for WaitForCallback (inherits callback timeouts + submitter retry strategy).
Libraries/src/Amazon.Lambda.DurableExecution/Services/LambdaDurableServiceClient.cs Adds onNewOperations callback to checkpointing and maps callback ops from NewExecutionState.
Libraries/src/Amazon.Lambda.DurableExecution/Operation.cs Adds WaitForCallback subtype constant and TIMED_OUT status constant.
Libraries/src/Amazon.Lambda.DurableExecution/IWaitForCallbackContext.cs New submitter-context interface (logger-only) for WaitForCallbackAsync.
Libraries/src/Amazon.Lambda.DurableExecution/Internal/LambdaSerializerHelper.cs Centralizes “serializer required” enforcement and message.
Libraries/src/Amazon.Lambda.DurableExecution/Internal/CallbackOperation.cs Implements durable callback operation semantics and ICallback<T> handle behavior.
Libraries/src/Amazon.Lambda.DurableExecution/IDurableContext.cs Adds public API surface for CreateCallbackAsync<T> and WaitForCallbackAsync<T>.
Libraries/src/Amazon.Lambda.DurableExecution/ICallback.cs Introduces public callback handle interface (CallbackId + GetResultAsync).
Libraries/src/Amazon.Lambda.DurableExecution/DurableFunction.cs Wires serializer helper and merges NewExecutionState operations into ExecutionState during checkpointing.
Libraries/src/Amazon.Lambda.DurableExecution/DurableContext.cs Implements CreateCallbackAsync and WaitForCallbackAsync composition + error mapping.
Libraries/src/Amazon.Lambda.DurableExecution/CallbackException.cs Adds callback exception hierarchy (CallbackException + Failed/Timeout/Submitter).
Libraries/src/Amazon.Lambda.DurableExecution/CallbackConfig.cs Adds timeout + heartbeat timeout config with sub-second validation.
Docs/durable-execution-design.md Updates design doc to reflect callback APIs, contexts, and exception hierarchy.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-child-context branch from 646b841 to 4d97473 Compare May 21, 2026 15:22
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-callbacks branch from efe77ee to cf0c2b2 Compare May 21, 2026 18:49
Comment thread .autover/changes/118fe72a-a0e5-4119-ae9a-7d6e41ef1b71.json Outdated
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-child-context branch from 4d97473 to 8a6c41c Compare May 21, 2026 18:56
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-callbacks branch from b18ef28 to fda5bfd Compare May 22, 2026 15:51
Base automatically changed from gcbeatty/durable-child-context to feature/durablefunction May 23, 2026 15:58
Adds callback support to the .NET Durable Execution SDK. CreateCallbackAsync
returns an ICallback<T> handle (CallbackId + GetResultAsync) that suspends
the workflow until an external system delivers a result via the durable
execution service. WaitForCallbackAsync composes CreateCallback + a
submitter step + GetResultAsync inside a child context for the common
"submit and wait" pattern.

Public surface:
- IDurableContext.CreateCallbackAsync<T> (single overload)
- IDurableContext.WaitForCallbackAsync<T> (single overload)
- ICallback<T> with CallbackId and GetResultAsync
- IWaitForCallbackContext (Logger only) for submitter functions
- CallbackConfig (Timeout + HeartbeatTimeout, validates sub-second values)
- WaitForCallbackConfig : CallbackConfig adds RetryStrategy
- Exception subclass tree: CallbackException base + CallbackFailedException,
  CallbackTimeoutException, CallbackSubmitterException

Both APIs read the ILambdaSerializer from ILambdaContext.Serializer
(typically registered via LambdaBootstrapBuilder.Create(handler, serializer))
and throw InvalidOperationException if no serializer is registered. AOT and
reflection-based scenarios share a single overload — the AOT story is
determined by the registered serializer.

Internal:
- CallbackOperation<T> handles fresh execution sync-flush of START with
  service-allocated CallbackId, deferred error propagation, and replay
  for SUCCEEDED/FAILED/TIMED_OUT/STARTED/PENDING. Unknown statuses throw
  NonDeterministicExecutionException.
- LambdaDurableServiceClient gains an onNewOperations callback so the
  freshly-allocated CallbackId from NewExecutionState flows back into
  ExecutionState during the START flush.
- WaitForCallback's error mapping preserves subclass fidelity on
  parent-CONTEXT-FAILED replay (CallbackTimeoutException remains
  CallbackTimeoutException, etc.).

Adds unit tests + integration tests covering happy path, timeout,
failure, submitter failure, replay determinism, and replay of each
exception subtype.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Delete .autover/changes/118fe72a-a0e5-4119-ae9a-7d6e41ef1b71.json
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-callbacks branch from fda5bfd to 8c443b4 Compare May 23, 2026 16:52
- LambdaDurableServiceClient: map StackTrace and ErrorData on
  CallbackDetails.Error, matching the Step/Context error mappings.
  Without this, callback errors delivered via NewExecutionState lose
  diagnostic detail that BuildFailedException/BuildTimeoutException
  expect to surface.
- CallbackOperation.GetResultAsync: re-read State.GetOperation before
  suspending. A later checkpoint in the same invocation (e.g. the
  WaitForCallback submitter-step flush) can merge a terminal status
  via NewExecutionState; resolving immediately avoids a wasted
  reinvocation. Matches the Java (onCheckpointComplete) and JS
  (waitForStatusChange) behavior.
@GarrettBeatty GarrettBeatty marked this pull request as ready for review May 27, 2026 15:59
@GarrettBeatty GarrettBeatty requested review from a team as code owners May 27, 2026 15:59
@GarrettBeatty GarrettBeatty requested review from normj and philasmar and removed request for a team May 27, 2026 15:59

COPY bin/publish/ ${LAMBDA_TASK_ROOT}

ENTRYPOINT ["/var/task/bootstrap"]
Copy link
Copy Markdown
Collaborator

@philasmar philasmar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important Issues

  1. Thread-safety of ExecutionState.AddOperations called from batcher callback
    - File: Libraries/src/Amazon.Lambda.DurableExecution/DurableFunction.cs (line ~99)
    - state.AddOperations is wired as the onNewOperations callback from CheckpointBatcher. The batcher worker runs on a background thread while the user's workflow code reads
    from ExecutionState on the workflow thread. If AddOperations or GetOperation aren't synchronized, this is a data race. Worth verifying that ExecutionState is either thread-safe
    or that the batcher guarantees the callback runs before the EnqueueAsync task completes (which appears to be the case given the sync-flush semantics, but should be
    documented).
  2. CallbackOperation implements ICallback directly — lifetime coupling
    - File: Libraries/src/Amazon.Lambda.DurableExecution/Internal/CallbackOperation.cs
    - The operation is the callback handle returned to user code. This means the user holds a reference to the full DurableOperation machinery (State, Termination, Batcher
    references) for the lifetime of the handle. This is fine functionally, but be aware that if the user stores ICallback long-term (e.g., in a field), those references remain
    rooted. The same pattern exists in the Java SDK so this is probably intentional, but worth a comment noting the design choice.

Nits

  1. Dockerfiles run as root (Semgrep flagged on 7+ Dockerfiles) — same pattern as #2370, test-only containers under Lambda isolation. Low priority.
  2. CallbackConfig rejects negative TimeSpan but doesn't reject TimeSpan.MaxValue — unlikely in practice, but Math.Ceiling(TimeSpan.MaxValue.TotalSeconds) would overflow int.
    Consider capping at a reasonable max (e.g., 7 days = 604800s).

Copy link
Copy Markdown
Member

@normj normj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you address Phil's comments

@GarrettBeatty GarrettBeatty requested a review from philasmar May 28, 2026 15:04
@GarrettBeatty
Copy link
Copy Markdown
Contributor Author

@philasmar addressed your comments in 724d4de

@GarrettBeatty GarrettBeatty merged commit edd1cc0 into feature/durablefunction May 28, 2026
3 of 5 checks passed
@GarrettBeatty GarrettBeatty deleted the gcbeatty/durable-callbacks branch May 28, 2026 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Release Not Needed Add this label if a PR does not need to be released.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants