fix(operations): rebuild no-backup instances via fresh PVCs#10295
fix(operations): rebuild no-backup instances via fresh PVCs#10295weicao wants to merge 8 commits into
Conversation
|
Auto Cherry-pick Instructions |
|
/pick release-1.0 release-1.1 |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #10295 +/- ##
==========================================
- Coverage 52.83% 52.76% -0.08%
==========================================
Files 533 533
Lines 61213 61250 +37
==========================================
- Hits 32343 32319 -24
- Misses 25621 25674 +53
- Partials 3249 3257 +8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
When does this scenario occur? Can it be reproduced? |
|
Yes. This occurs in the no-backup in-place RebuildInstance path when the old source PVC is being removed/recreated, but the legacy code still tries to run the backup-style helper PVC/PV handoff. The failure condition is: helper pod finishes, then It was reproduced twice with a MySQL semisync test on the release-1.0/1.0.3 line:
This PR changes only the no-backup path: it stops creating helper tmp PVCs/pods, records the old source PVC identity, releases the old target pod first, waits for the workload to recreate the same-name source PVC from the current template, and then waits for the rebuilt pod. The backup/helper-PV path is left unchanged. |
|
Clarification to the previous explanation: the old failure should be described as a restored-PV lookup failure, not as proof that the helper PV could never be created. The old code path is:
So the reproduced symptom is clear, but the old helper-PV path root cause is narrower than my earlier wording: we did not prove that the PV was impossible to create; we proved that the controller could not resolve a transferable restored PV through the old lookup contract in that no-backup rebuild run. This does not change the current PR direction. For no-backup rebuild, there is no backup data to restore or transfer, so requiring the backup-style tmp PVC/helper PV handoff is unnecessary. This PR removes that dependency only for the no-backup path and instead rebuilds through workload-owned source PVC deletion/recreation. The backup path continues to use the helper/restored-PV handoff. |
|
Your modification approach has issues. Here's what I suggest: |
|
Updated in What changed:
The implementation still uses the existing helper receiver as the local holder for target pod / component / PVC-template context, but the no-backup path does not create helper tmp PVCs, helper pods, or helper PVs. Local validation passed on
GitHub CI is rerunning on the new head. |
|
Suggestions: In rebuild_instance.go, do not modify anything else. Only modify the getPVCMapAndVolumes function within prepareInplaceRebuildHelper: if the backup is empty, do not obtain volumes and tmpPVC. Modify the rebuildSourcePVCsByDynamicProvision logic as follows: First, call setInstanceNodeSelectorForRebuild. Then, based on the sourcePVC in pvcMap, delete it if it exists. Finally, call deleteTargetPodForRebuild to delete the Pod. Set progressDetail.Message = waitingForInstanceReadyMessage. Do not change anything else. |
|
Updated in What changed:
One small guard remains in Local validation passed on
GitHub CI is rerunning on the new head. |
|
Updated in Changes:
Local validation:
GitHub CI is running on the new head. |
|
replace |
|
Updated in Changes:
Local validation:
GitHub CI is running on the new head. |
Problem
No-backup in-place RebuildInstance has no backup data to restore. The old flow still created temporary PVCs and a helper pod, then tried to hand off a helper PV to the source PVC. In environments where that helper PV is not visible to the controller, the operation can fail with
can not found the pv by the pvc ....For no-backup rebuild, the intended result is a fresh source PVC from the current workload template, not a restored helper PV handoff.
Changes
Bound, then delete the target pod and wait for the rebuilt pod to become ready.spec.volumeNameare visible, reject rebinding to the old PV.Validation
KUBEBUILDER_ASSETS="$(.../setup-envtest-release-0.21 use 1.26.1 -p path)" go test ./pkg/operations -run TestAPIs -ginkgo.focus "test rebuild instance with no backup" -count=1 -vKUBEBUILDER_ASSETS="$(.../setup-envtest-release-0.21 use 1.26.1 -p path)" go test ./pkg/operations -run TestAPIs -count=1go test -c ./pkg/operationsgit diff --checkFixes #10293.