Skip to content

feat: define child workflow state machine#99

Open
timl3136 wants to merge 2 commits intocadence-workflow:mainfrom
timl3136:child-wf-1
Open

feat: define child workflow state machine#99
timl3136 wants to merge 2 commits intocadence-workflow:mainfrom
timl3136:child-wf-1

Conversation

@timl3136
Copy link
Copy Markdown
Member

@timl3136 timl3136 commented May 5, 2026

What changed?
define child workflow state machine

Why?
essential step for implement child wf

How did you test it?
unit tests

Potential risks

Release notes

Documentation Changes

Signed-off-by: Tim Li <ltim@uber.com>
Comment thread cadence/_internal/workflow/statemachine/child_workflow_execution_state_machine.py Outdated
Comment thread cadence/_internal/workflow/statemachine/nondeterminism.py
request: decision.StartChildWorkflowExecutionDecisionAttributes
execution: DecisionFuture[WorkflowExecution]
result: DecisionFuture[Payload]
_initiated_event_id: int | None = None
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think this is unused

StartChildWorkflowExecutionFailed,
)

child_workflow_events = EventDispatcher("workflow_id")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you pass "initiated_event_id" as the parameter here you won't have to repeat it on all the decorators.

domain=self.request.domain,
workflow_execution=WorkflowExecution(
workflow_id=self.request.workflow_id,
run_id="",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting case. We don't have a RunID (as it hasn't started yet) but we'd like to cancel it. Does this match how the other clients handle it?

Since transfer tasks can be executed in parallel I wonder if this would create a race condition on the server. We might try canceling the Workflow before it has ever started. We might need to wait for it to start before we can cancel it.

)
if self.state is DecisionState.CANCELED_AFTER_REQUESTED:
return record_immediate_cancel(self.request)
if self.state is DecisionState.CANCELED_AFTER_RECORDED:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need to model the CANCELED_AFTER_STARTED scenario to cancel an in-flight Child workflow. At that point we'd have the RunID.

return True

if self.state is DecisionState.STARTED:
self._transition(DecisionState.CANCELED_AFTER_RECORDED)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I mentioned this above, but this may be a separate state because the Workflow has already started.

self._transition(DecisionState.COMPLETED)
self.result.set_exception(ChildWorkflowExecutionTerminated())

@child_workflow_events.event("workflow_execution")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state machine is never keyed by the WorkflowExecution, only the WorkflowID. To dispatch this event it would grab the value of the worklfow_execution field and look it up in the aliases dict in DecisionManager.

We likely need to extend the EventDispatcher and/or DecisionManager to handle this case.

Oh this is what gitar is commenting on below.



@to_expectation.register
def _(attrs: history.StartChildWorkflowExecutionFailedEventAttributes) -> Expectation:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expectation value shouldn't be CANCEL here and we need to add Expectation building for the other cancellation events.

Comment thread cadence/error.py Outdated
Comment on lines +279 to +288
def child_wf_initiated(
event_id: int, workflow_id: str, *, initiated_event_id: int
) -> history.HistoryEvent:
return history.HistoryEvent(
event_id=event_id,
start_child_workflow_execution_initiated_event_attributes=history.StartChildWorkflowExecutionInitiatedEventAttributes(
workflow_id=workflow_id,
workflow_type=WorkflowType(name="MyWorkflow"),
),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Quality: Test helper has unused initiated_event_id parameter

The child_wf_initiated helper function accepts an initiated_event_id keyword argument that is never used in the function body. This is misleading because callers pass values like initiated_event_id=1, suggesting the parameter configures something, but it has no effect. The StartChildWorkflowExecutionInitiatedEventAttributes proto doesn't have an initiated_event_id field — it IS the initiated event, and its event_id is set via the outer HistoryEvent(event_id=...).

Suggested fix:

def child_wf_initiated(
    event_id: int, workflow_id: str
) -> history.HistoryEvent:
    return history.HistoryEvent(
        event_id=event_id,
        start_child_workflow_execution_initiated_event_attributes=history.StartChildWorkflowExecutionInitiatedEventAttributes(
            workflow_id=workflow_id,
            workflow_type=WorkflowType(name="MyWorkflow"),
        ),
    )

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented May 6, 2026

CI failed: The CI build failed due to formatting inconsistencies in the newly added files, which triggered a linting error. Running the project's formatting tools locally will resolve the issue.

Overview

One failure was identified in the CI build process, caused by unformatted code in the new feature branch. Total logs analyzed: 1.

Failures

Formatting Check Failure (confidence: high)

  • Type: tooling
  • Affected jobs: 74705760474
  • Related to change: yes
  • Root cause: The committed code violates the project's style standards. Specifically, the CI runner identified 4 files that require reformatting to match the current style configuration.
  • Suggested fix: Run the project's automated formatting command (as indicated in the CI logs: uv run python scripts/format.py) locally to reformat the modified files before committing and pushing the changes.

Summary

  • Change-related failures: 1 (formatting violation in new files).
  • Infrastructure/flaky failures: 0.
  • Recommended action: Execute the project's linting/formatting script locally, commit the resulting changes, and push to the branch.
Code Review 👍 Approved with suggestions 2 resolved / 3 findings

Defines the child workflow state machine with improved event dispatching and error handling, addressing previous runtime routing and expectation logic issues. Remove the unused initiated_event_id parameter from the test helper function.

💡 Quality: Test helper has unused initiated_event_id parameter

📄 tests/cadence/_internal/workflow/statemachine/test_decision_manager.py:279-288

The child_wf_initiated helper function accepts an initiated_event_id keyword argument that is never used in the function body. This is misleading because callers pass values like initiated_event_id=1, suggesting the parameter configures something, but it has no effect. The StartChildWorkflowExecutionInitiatedEventAttributes proto doesn't have an initiated_event_id field — it IS the initiated event, and its event_id is set via the outer HistoryEvent(event_id=...).

Suggested fix
def child_wf_initiated(
    event_id: int, workflow_id: str
) -> history.HistoryEvent:
    return history.HistoryEvent(
        event_id=event_id,
        start_child_workflow_execution_initiated_event_attributes=history.StartChildWorkflowExecutionInitiatedEventAttributes(
            workflow_id=workflow_id,
            workflow_type=WorkflowType(name="MyWorkflow"),
        ),
    )
✅ 2 resolved
Bug: handle_cancel_initiated routing will fail at runtime

📄 cadence/_internal/workflow/statemachine/child_workflow_execution_state_machine.py:168-178
The handle_cancel_initiated handler uses @child_workflow_events.event("workflow_execution") as its id_attr. At dispatch time, DecisionManager.handle_history_event will do getattr(attrs, "workflow_execution") which yields a WorkflowExecution protobuf object. The alias lookup uses (decision_type, id_for_event) as the key, so it will look for (CHILD_WORKFLOW, <WorkflowExecution object>). However, the state machine is registered under (CHILD_WORKFLOW, "child-wf-1") (a string). These won't match, so the event will never be routed to the state machine.

Additionally, since handle_cancel_initiated doesn't set event_id_is_alias=True, the cancel-initiated event's ID is never registered as an alias. This means handle_cancel_failed (which looks up by "initiated_event_id") will also fail to route.

The tests pass because they call sm.handle_cancel_initiated(...) directly, bypassing the dispatcher routing.

Bug: StartChildWorkflowExecutionFailed uses CANCEL expectation incorrectly

📄 cadence/_internal/workflow/statemachine/nondeterminism.py:288-293
In nondeterminism.py, the to_expectation registration for StartChildWorkflowExecutionFailedEventAttributes uses CANCEL (i.e., {"canceled": True}) as its properties. However, CANCEL is semantically used for cancellation operations (e.g., ActivityTaskCancelRequestedEventAttributes, CancelTimerFailedEventAttributes). An initiation failure is not a cancellation — it should probably carry the same properties as the initiation expectation ({"workflow_type": attrs.workflow_type.name}) to properly enforce determinism. If a replayed history has a start-child-failed event but the new code expects a different workflow type, the current CANCEL expectation won't catch that mismatch.

🤖 Prompt for agents
Code Review: Defines the child workflow state machine with improved event dispatching and error handling, addressing previous runtime routing and expectation logic issues. Remove the unused `initiated_event_id` parameter from the test helper function.

1. 💡 Quality: Test helper has unused `initiated_event_id` parameter
   Files: tests/cadence/_internal/workflow/statemachine/test_decision_manager.py:279-288

   The `child_wf_initiated` helper function accepts an `initiated_event_id` keyword argument that is never used in the function body. This is misleading because callers pass values like `initiated_event_id=1`, suggesting the parameter configures something, but it has no effect. The `StartChildWorkflowExecutionInitiatedEventAttributes` proto doesn't have an `initiated_event_id` field — it IS the initiated event, and its event_id is set via the outer `HistoryEvent(event_id=...)`.

   Suggested fix:
   def child_wf_initiated(
       event_id: int, workflow_id: str
   ) -> history.HistoryEvent:
       return history.HistoryEvent(
           event_id=event_id,
           start_child_workflow_execution_initiated_event_attributes=history.StartChildWorkflowExecutionInitiatedEventAttributes(
               workflow_id=workflow_id,
               workflow_type=WorkflowType(name="MyWorkflow"),
           ),
       )

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants