[flink] Fix scan parallelism propagating to downstream operators#7914
[flink] Fix scan parallelism propagating to downstream operators#7914ArnavBalyan wants to merge 1 commit into
Conversation
|
cc @JingsongLi thanks! |
|
@ArnavBalyan Thanks for picking this up! The direction is right, but I think there are a few gaps worth addressing before merging:
The PR only propagates options.get(SCAN_PARALLELISM). However, Paimon's parallelism resolution has two sources — see FlinkTableSource#inferSourceParallelism:
This PR only covers case (1). For case (2) — which is the default behavior most users hit — the inferred value is still applied via DataStream#setParallelism() inside the
After this PR, parallelism is applied in two places:
The DataStream#setParallelism(int) overload sets isParallelismConfigured=false, while the planner path sets it to true. Depending on execution order, the planner's
ParallelismProvider#getParallelism() only fully solves the forward-edge issue from Flink 1.19+ (FLIP-367), where the planner explicitly calls setParallelism(p, true) on the |
|
State compatibility / upgrade risk This fix changes the resulting job graph topology for any existing user who has scan.parallelism set or relies on inferred parallelism:
For users upgrading from an older Paimon version with an existing Flink savepoint/checkpoint, this is a breaking topology change:
This needs to be called out explicitly in the PR description / release notes as a breaking change for stateful streaming jobs. |
Purpose
Tests