fix: surface HPA errors on ScaledObject#7669
Conversation
|
Thank you for your contribution! 🙏 Please understand that we will do our best to review your PR and give you feedback as soon as possible, but please bear with us if it takes a little longer as expected. While you are waiting, make sure to:
Once the initial tests are successful, a KEDA member will ensure that the e2e tests are run. Once the e2e tests have been successfully completed, the PR may be merged at a later date. Please be patient. Learn more about our contribution guide. |
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
There was a problem hiding this comment.
Pull request overview
Surfaces managed HPA health problems on the ScaledObject by adjusting the Ready condition when the HPA cannot compute metrics (e.g., ScalingActive=False).
Changes:
- Add HPA health evaluation during
RequestScaleand aggregate it into the ScaledObjectReadycondition. - Introduce new Ready condition reasons:
HPAMetricsUnavailableandScalingDegraded. - Add unit tests for HPA health evaluation and update
CHANGELOG.md.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| pkg/scaling/executor/scale_scaledobjects.go | Adds HPA health reading (via autoscaling/v2 HPA status) and Ready-condition aggregation in RequestScale. |
| pkg/scaling/executor/scale_scaledobjects_test.go | Adds tests for HPA health logic and Ready-condition aggregation behavior. |
| apis/keda/v1alpha1/condition_types.go | Adds new ScaledObject Ready condition reason constants used by the executor. |
| CHANGELOG.md | Documents the user-facing behavior change for ScaledObject Ready condition vs HPA health. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
c52be98 to
db67926
Compare
|
/run-e2e |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
db67926 to
333abcb
Compare
|
/run-e2e |
333abcb to
0a53c7e
Compare
|
/run-e2e |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/run-e2e |
b04e496 to
0a53c7e
Compare
JorTurFer
left a comment
There was a problem hiding this comment.
I like this idea, but I'm not sure if checking this makes sense here. I mean this will check the HPA on each controller loop, which means an extra request to the API (I know that it's cached).
The HPA is alreade added to ScaledObjectController's reconcilliation loop
keda/controllers/keda/scaledobject_controller.go
Lines 145 to 150 in 1b105b3
Does it make sense to include the change of these fields in predicates and manage changes via watch instead of regular checks?
I did a little bit of digging into the history (more on the issue #7649), apparently that was how KEDA had it before and it was causing too frequent reconciliations and unnecessary condition flapping. Per my testing, HPA really can throw a good number of events leading to extremely frequent reconciles. It was enough to have 200 SOs in cluster with trigger-happy HPAs to take KEDA effectively down.
exactly, because this uses cached client, it's read from memory => 0 extra API calls. |
|
/run-e2e |
00d8b85 to
6cfc4f4
Compare
|
/run-e2e |
|
/run-e2e |
Signed-off-by: Jan Wozniak <wozniak.jan@gmail.com>
Co-authored-by: Rick Brouwer <rickbrouwer@gmail.com> Signed-off-by: Jan Wozniak <wozniak.jan@gmail.com>
6cfc4f4 to
a491504
Compare
|
/run-e2e |
|
/run-e2e ^tests/(internals|secret-providers|scalers/(datadog|[^d][^/]))/._test.go$ skip dynatrace - its CI secret DYNATRACE_HOST returns 404, unrelated to this PR Update: You can check the progress here |
|
/run-e2e internal the only thing that failed in #7669 (comment) were dynatrace tests (they have unrelated issue, 404 to putting metrics to the project). With #7669 (comment) I probably abused regex too much, tests weren't even triggered, so just a last sanity check before merging running for internal e2es |
KEDA's ScaledObject currently reports
Ready=Trueeven when its managed HPA cannot fetch metrics (HPA shows<unknown>targets withScalingActive=False). Users see a healthy-lookingScaledObjectwhile scaling is silently broken, the only way to discover the problem is to manually inspect the HPA. Examples of this might be broken metrics adapter, broken APIService.apiregistration, or invalid certs for the gRPC connection between operator and metrics adapter.HPA broken with adapter not running
Former SO status reporting when HPA is broken
Proposed SO status reporting when HPA is broken
Checklist
make generate-scalers-schemahas been run to update any outdated generated filesFixes #7649