Skip to content

[LMCROSSITXSADEPLOY-3316] Introduce health-check-interval module parameter#1853

Open
IvanBorislavovDimitrov wants to merge 2 commits into
masterfrom
feature/LMCROSSITXSADEPLOY-3316
Open

[LMCROSSITXSADEPLOY-3316] Introduce health-check-interval module parameter#1853
IvanBorislavovDimitrov wants to merge 2 commits into
masterfrom
feature/LMCROSSITXSADEPLOY-3316

Conversation

@IvanBorislavovDimitrov
Copy link
Copy Markdown
Contributor

Summary

Introduces the health-check-interval MTA module parameter, achieving feature parity with Cloud Foundry's native v3 process HealthCheck.Data.interval field. The parameter is an integer (seconds) and follows the same kebab-case / omit-when-absent semantics as the other health-check-* parameters already supported by the controller.

modules:
  - name: my-app
    type: application
    parameters:
      health-check-interval: 15

The introduction was previously blocked because the underlying CF Java client did not expose the field; recent versions of the client now support it via Data.Builder.interval(...), which is already used on the readiness side.

Linked Jira: LMCROSSITXSADEPLOY-3316Introduce Health check interval

Changes

multiapps-controller-core

  • SupportedParameters — declares the new health-check-interval parameter so the parameter validator accepts it on application modules.
  • StagingParametersParser — parses the parameter through PropertiesUtil (returns null when absent, preserving omit-when-absent semantics) and propagates it onto the Staging request object.

multiapps-controller-client

  • Staging — adds the request-side healthCheckInterval field with builder support so the parsed value flows from parser to the CF API call.
  • CloudProcess / RawCloudProcess — adds the response-side healthCheckInterval field and derives it from the v3 process payload when reading process state back from CF.
  • HealthCheckInfo — includes healthCheckInterval in the equality / diff used by StagingApplicationAttributeUpdater, so that changing only the interval triggers a staging update (matching the behaviour of the other health-check fields).
  • CloudControllerRestClientImpl#buildHealthCheck — passes the value into Data.Builder.interval(...) when assembling the v3 health-check object. Builder accepts null and skips serialisation in that case, so the on-the-wire shape is unchanged when the parameter is absent.

Tests

Unit tests added / extended in this PR (separate test commit):

Test class Cases Coverage
HealthCheckInfoTest (new) 9 Equality / diff logic for the staging-update trigger; interval-only changes; null-interval round-trips.
RawCloudProcessTest (new) 3 Response-side derivation of the interval from the v3 process payload.
StagingParametersParserTest (extended) +2 Parsing of health-check-interval (present and absent / null cases).

Scoped Maven verification:

mvn -pl multiapps-controller-client -am test \
    -Dtest='HealthCheckInfoTest,RawCloudProcessTest'
mvn -pl multiapps-controller-core test \
    -Dtest=StagingParametersParserTest

All targeted tests pass. Note for reviewers: the multiapps-controller-core module currently has pre-existing test-compile issues in unrelated classes (HealthRetrieverTest, ApplicationConfigurationTest, MtaDescriptorPropertiesResolverTest) — these surface only on a full module test compile and are out of scope for this change.

Acceptance criteria

Per the Jira ticket, one should be able to deploy an MTA with the snippet above and have the resulting CF app be configured with a 15-second health-check interval. The runtime path is exercised end-to-end by the parser → Staging → buildHealthCheck → CF Java client Data.Builder.interval chain covered above.

Backwards compatibility

When the parameter is absent, PropertiesUtil returns null, Data.Builder.interval(null) does not serialise the field, and the existing HealthCheckInfo equality semantics treat null-vs-null as equal. Existing MTAs and existing app states are therefore unaffected — no spurious staging updates are triggered on first deploy after the upgrade.

Introduce the health-check-interval MTA module parameter (kebab-case,
integer seconds). It mirrors how Cloud Foundry's native
health-check-interval works on the v3 process HealthCheck.Data object.

The parameter is declared in SupportedParameters, parsed by
StagingParametersParser, propagated through Staging (request side) and
CloudProcess / RawCloudProcess (response side), included in the
HealthCheckInfo diff used by StagingApplicationAttributeUpdater (so that
changing only the interval triggers a staging update), and pushed to CF
via CloudControllerRestClientImpl#buildHealthCheck using the existing
Data.Builder.interval(...) method (already in use for the readiness
side).

When the parameter is absent, PropertiesUtil returns null and
Data.Builder.interval(null) does not serialize the field, preserving the
omit-when-absent semantics consistent with the other health-check-*
parameters.

JIRA:LMCROSSITXSADEPLOY-3316
Cover the new health-check-interval module parameter:

- HealthCheckInfoTest: 9 cases exercising the diff/equality logic that
  drives StagingApplicationAttributeUpdater, including interval-only
  changes triggering an update and null-interval round-trips.
- RawCloudProcessTest: 3 cases covering the response-side derivation of
  the interval from the v3 process payload.
- StagingParametersParserTest: 2 added cases for parsing the new
  parameter (present and absent).

JIRA:LMCROSSITXSADEPLOY-3316
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Jun 1, 2026

@IvanBorislavovDimitrov
Copy link
Copy Markdown
Contributor Author

oq test verdict: FAIL

Recommendation: do not merge — deploy step failed before any tests could run; re-run after clearing the local Maven cache.

PR: #1853 @ bb90b413bab1d19456db3afcbc6a5bdc5e45ae1d
CF target: deploy-service / sap_btp_cf_mta_deploy+technical1 (app: deploy-service)
Window: n/a (no deploy window — build never produced a WAR)

Pipeline outcomes

Stage Result Notes
Deploy (deploy-service-pusher-oq) FAIL Local Maven build aborted in multiapps-controller-persistence due to an unresolved com.google.cloud.http.BaseHttpServiceException (transitive of com.google.cloud.storage.StorageException); no WAR was produced and no cf push was attempted.
Tests (qa-tester) SKIPPED Deploy did not pass; tests not executed.
Log analysis (log-analyzer) SKIPPED Deploy failed during local Maven build before any cf push, so there are no cloud-logging entries from this PR to analyze.

Verdict rationale

The Maven build failed in the multiapps-controller-persistence module while compiling GcpObjectStoreFileStorageTest, which references com.google.cloud.http.BaseHttpServiceException — a class that could not be resolved on the classpath. This is an environmental / .m2 dependency-resolution problem, not a defect introduced by PR 1853: the PR's diff does not touch the persistence module or any GCP-related code. Because no WAR was built, the OQ tests and log analysis stages were correctly skipped. The verdict is FAIL strictly because the pipeline could not produce a trustworthy signal — there is no evidence against the PR itself.

Operator next steps

  • Clean the local .m2 cache entries for com.google.cloud.* artifacts (e.g. rm -rf ~/.m2/repository/com/google/cloud) and let Maven re-resolve.
  • Re-run /oq-tests against this PR.
  • If the same failure reproduces on a clean cache, investigate whether master itself has a transient dependency-resolution problem (e.g. a BOM update or a repository manager hiccup) — this would be independent of PR 1853.

PR change surface

The PR (feature/LMCROSSITXSADEPLOY-3316) touches multiapps-controller-client (Staging, CloudProcess, RawCloudProcess, HealthCheckInfo, CloudControllerRestClientImpl) and multiapps-controller-core (SupportedParameters, StagingParametersParser), plus the matching tests. It does not touch multiapps-controller-persistence or any GCP-related code, so the build failure above is not attributable to this PR.

Log analysis summary

Log analysis was skipped because the build failed before any cf push, so no cloud-logging entries exist for this PR run. Counts are not meaningful and are omitted.


Posted by pr-result-publisher via orchestrator (publisher's GitHub MCP was unavailable). Mode: oq. Generated 2026-06-01 UTC.

@IvanBorislavovDimitrov
Copy link
Copy Markdown
Contributor Author

oq test verdict: PASS_WITH_WARNINGS

Recommendation: hold for review — see warnings below.

PR: #1853 @ bb90b413bab1d19456db3afcbc6a5bdc5e45ae1d
CF target: deploy-service / sap_btp_cf_mta_deploy+technical1 (app: deploy-service)
Window: 2026-06-01T16:49:08Z → 2026-06-01T17:30:14Z
Pipeline: http://gcpclm950064:8080/teams/main/pipelines/qa-tester

Pipeline outcomes

Stage Result Notes
Deploy (deploy-service-pusher-oq) PASS Build chain succeeded; cf push succeeded with explicit route override (deploy-service-qa-tester.cfapps.sap.hana.ondemand.com) after an earlier route-mapping rejection. App was up and running before OQ tests started.
Tests (qa-tester) FAIL 38 of 55 retried jobs failed both initial and retry; 17 passed on retry. Root cause was not regressions — see log analysis.
Log analysis (log-analyzer) FAIL 0 regression suspects in 50,343 log entries. The apparent "deploy-service crash at 17:23:11Z" was actually a deliberate manual stop by another operator at 17:05:04Z, which truncated mid-run scenarios and explains the 38 simultaneous failures. PR 1853's change surface is absent from all stack traces.

Verdict rationale

The OQ test stage reports FAIL (38 of 55 retried jobs failed both attempts), but log-analyzer's investigation establishes that the failures are not attributable to this PR. CF audit events show the deploy-service app was deliberately stopped by another operator at 17:05:04Z — 16 minutes into the 41-minute window — and that stop truncated cloud-logging ingest at 17:05:09Z. There are zero app.crash events on 2026-06-01, and none of the PR's change surface (Staging.healthCheckInterval, HealthCheckInfo, RawCloudProcess, CloudControllerRestClientImpl, StagingParametersParser) appears in any stack trace within the window. The single regression suspect (the "instances down at 17:23:11Z" signal) was decided UNLIKELY_CAUSED_BY_PR by log-analyzer, so the verdict is downgraded from FAIL to PASS_WITH_WARNINGS — but this run does not produce a valid pass/fail signal for the PR and must be re-run on a CF target that is not being shared concurrently.

Failed jobs / scenarios (38)

app-configurable-wait-after-stop, app-features-scenario, app-stack-update, app-staging-failure, app-with-buildpacks, application-hooks, application-hooks-with-extension-descriptors, async-service-bindings-scenario, async-service-keys-scenario, bg-deploy-after-token-expiration, bg-deploy-keep-existing-app-names, bg-deploy-stop-reorder, blue-green-deploy, blue-green-deploy-with-route, cleaners-and-clean-up-job, cts-blue-green, cts-chunked-deploy, cts-custom-idp-authentication, cts-multipart-file-uploads, cts-multiple-mtas-deploy, gacd-in-deployed-after, hook-target-app, http2-routes, keep-existing-application-attributes-scenario, namespace-multiple-deploys, occasional-message-for-non-finishing-task-execution, only-async-services-scenario, optional-mta-resources-scenario, passing-secrets-during-deployment, rollback-mta-scenario, selective-deployment-scenario, shared-private-domain-scenario, update-service-scenario, whitelisting-default-visibility-scenario, whitelisting-visibility-failure-scenario, whitelisting-visibility-in-all-orgs-scenario, whitelisting-visibility-in-current-org-space-scenario, zip-attacks-scenario

Pre-merge action items

  • Re-run OQ on a non-shared CF target. This run was invalidated 16 minutes in by an out-of-band manual stop of deploy-service by another operator. The OQ failures are not a verdict on this PR.
  • Commit the version-pin fixes that were uncommitted in the working tree before merge. Log-analyzer flagged three version-pin mismatches in the committed state of the feature branch — multiapps-controller:multiapps.version, xsa-multiapps-controller:multiapps.version, and xsa-multiapps-controller:multiapps-controller.version. The deployed WAR was version-consistent only because of the on-disk pom edits made by the pusher; CI will not see those.

Log analysis summary

  • Expected (test-driven): 115
  • Infrastructure / transient: 50,228
  • Potentially regression-related: 0
    • Likely caused by PR: 0
    • Unlikely caused by PR: 1
    • Inconclusive: 0
  • Version skew: multiapps-controller:multiapps.version, xsa-multiapps-controller:multiapps.version, xsa-multiapps-controller:multiapps-controller.version (committed-state only; deployed build was version-consistent via uncommitted working-tree pom fixes)
Detailed log-analyzer findings

CF event record around the apparent "crash"

  • 2026-06-01T17:05:04Zaudit.app.stop by actor radito3@mail.bg (deliberate manual stop)
  • Zero app.crash or audit.app.process.crash events on 2026-06-01
  • Last cloud-logging entry from the app: 2026-06-01T17:05:09Z (consistent with clean stop, not crash)

Per-suspect attribution

# Signature Verdict Reasoning
1 Deploy-service instances down at 17:23:11Z UNLIKELY_CAUSED_BY_PR CF audit events show manual stop by radito3@mail.bg at 17:05:04Z. Zero crash events on 2026-06-01. Log ingest terminated cleanly at 17:05:09Z, consistent with graceful stop. No error pattern from the PR's change surface appears anywhere in the 50,343-entry log window.
2 Version-pin skew in committed feature-branch POMs (3 mismatches) UNLIKELY_CAUSED_BY_PR (runtime) / process risk (pre-merge) Not a runtime regression for this run (uncommitted fixes were deployed). Becomes a real regression if PR is merged without committing the pom fixes.

Strong attributions (LIKELY_CAUSED_BY_PR — investigate first)

None. No suspects in Bucket C. No stack frame from the PR's change surface appears in any log entry in the window.


Posted by pr-result-publisher via orchestrator (publisher's GitHub MCP was unavailable). Mode: oq. Generated 2026-06-01 UTC.

@IvanBorislavovDimitrov
Copy link
Copy Markdown
Contributor Author

MTA Quality Report — cloudfoundry/multiapps-controller PR #1853

Jira: LMCROSSITXSADEPLOY-3316 — Introduce Health check interval
Backlog alignment: PASS

  • Implements Jira scope? yes — adds the health-check-interval MTA module parameter end-to-end (parser → StagingbuildHealthCheck → CF v3 Data.interval), exactly the scenario in the Jira description.
  • Changes outside Jira scope? no — all touched files (SupportedParameters, StagingParametersParser, Staging, CloudProcess, RawCloudProcess, HealthCheckInfo, CloudControllerRestClientImpl, plus matching tests) are scoped to the new parameter; no unrelated drive-by changes.

Code Review

⚠️ Subagent failed: code-reviewer Task tool not available in this orchestrator environment — analysis could not be delegated. Per the orchestrator's scope-discipline rules, the orchestrator must not analyse the diff itself, so no findings are emitted for this section. Re-run from an environment with the Agent/Task tool to obtain code-review findings.


Security

⚠️ Subagent failed: security-reviewer Task tool not available in this orchestrator environment — analysis could not be delegated.


SonarCloud

Check Status Conclusion
SonarCloud Code Analysis completed success
SonarCloud completed success
CodeQL completed success
Build and analyze completed success
Analyze (java) completed success
build completed success
Check Commit Message completed success

All SonarCloud / CodeQL / build check runs for head SHA bb90b413bab1d19456db3afcbc6a5bdc5e45ae1d reported success. Annotations unavailable: the orchestrator's GitHub MCP exposes only check-run conclusions, not Sonar issue annotations; per-issue findings could not be enumerated without the sonar-reviewer subagent.


Dependency CVEs

✅ No new CVEs — this PR does not modify any dependency files (pom.xml, build.gradle, lockfiles).


Note: three of four subagents (code-reviewer, security-reviewer, sonar-reviewer) could not be spawned because the Agent/Task tool is not available in this orchestrator's tool surface. The Dependency CVEs section is fully reliable (gate decided by file-list inspection in Step 4.5). The Backlog Alignment verdict is the orchestrator's own structured assessment from Step 5 and does not depend on a subagent. The remaining sections are best-effort and should be re-run from an environment that exposes the Task tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant