[LMCROSSITXSADEPLOY-3316] Introduce health-check-interval module parameter#1853
[LMCROSSITXSADEPLOY-3316] Introduce health-check-interval module parameter#1853IvanBorislavovDimitrov wants to merge 2 commits into
Conversation
Introduce the health-check-interval MTA module parameter (kebab-case, integer seconds). It mirrors how Cloud Foundry's native health-check-interval works on the v3 process HealthCheck.Data object. The parameter is declared in SupportedParameters, parsed by StagingParametersParser, propagated through Staging (request side) and CloudProcess / RawCloudProcess (response side), included in the HealthCheckInfo diff used by StagingApplicationAttributeUpdater (so that changing only the interval triggers a staging update), and pushed to CF via CloudControllerRestClientImpl#buildHealthCheck using the existing Data.Builder.interval(...) method (already in use for the readiness side). When the parameter is absent, PropertiesUtil returns null and Data.Builder.interval(null) does not serialize the field, preserving the omit-when-absent semantics consistent with the other health-check-* parameters. JIRA:LMCROSSITXSADEPLOY-3316
Cover the new health-check-interval module parameter: - HealthCheckInfoTest: 9 cases exercising the diff/equality logic that drives StagingApplicationAttributeUpdater, including interval-only changes triggering an update and null-interval round-trips. - RawCloudProcessTest: 3 cases covering the response-side derivation of the interval from the v3 process payload. - StagingParametersParserTest: 2 added cases for parsing the new parameter (present and absent). JIRA:LMCROSSITXSADEPLOY-3316
|
|
| Stage | Result | Notes |
|---|---|---|
Deploy (deploy-service-pusher-oq) |
FAIL | Local Maven build aborted in multiapps-controller-persistence due to an unresolved com.google.cloud.http.BaseHttpServiceException (transitive of com.google.cloud.storage.StorageException); no WAR was produced and no cf push was attempted. |
Tests (qa-tester) |
SKIPPED | Deploy did not pass; tests not executed. |
Log analysis (log-analyzer) |
SKIPPED | Deploy failed during local Maven build before any cf push, so there are no cloud-logging entries from this PR to analyze. |
Verdict rationale
The Maven build failed in the multiapps-controller-persistence module while compiling GcpObjectStoreFileStorageTest, which references com.google.cloud.http.BaseHttpServiceException — a class that could not be resolved on the classpath. This is an environmental / .m2 dependency-resolution problem, not a defect introduced by PR 1853: the PR's diff does not touch the persistence module or any GCP-related code. Because no WAR was built, the OQ tests and log analysis stages were correctly skipped. The verdict is FAIL strictly because the pipeline could not produce a trustworthy signal — there is no evidence against the PR itself.
Operator next steps
- Clean the local
.m2cache entries forcom.google.cloud.*artifacts (e.g.rm -rf ~/.m2/repository/com/google/cloud) and let Maven re-resolve. - Re-run
/oq-testsagainst this PR. - If the same failure reproduces on a clean cache, investigate whether
masteritself has a transient dependency-resolution problem (e.g. a BOM update or a repository manager hiccup) — this would be independent of PR 1853.
PR change surface
The PR (feature/LMCROSSITXSADEPLOY-3316) touches multiapps-controller-client (Staging, CloudProcess, RawCloudProcess, HealthCheckInfo, CloudControllerRestClientImpl) and multiapps-controller-core (SupportedParameters, StagingParametersParser), plus the matching tests. It does not touch multiapps-controller-persistence or any GCP-related code, so the build failure above is not attributable to this PR.
Log analysis summary
Log analysis was skipped because the build failed before any cf push, so no cloud-logging entries exist for this PR run. Counts are not meaningful and are omitted.
Posted by pr-result-publisher via orchestrator (publisher's GitHub MCP was unavailable). Mode: oq. Generated 2026-06-01 UTC.
|
| Stage | Result | Notes |
|---|---|---|
Deploy (deploy-service-pusher-oq) |
PASS | Build chain succeeded; cf push succeeded with explicit route override (deploy-service-qa-tester.cfapps.sap.hana.ondemand.com) after an earlier route-mapping rejection. App was up and running before OQ tests started. |
Tests (qa-tester) |
FAIL | 38 of 55 retried jobs failed both initial and retry; 17 passed on retry. Root cause was not regressions — see log analysis. |
Log analysis (log-analyzer) |
FAIL | 0 regression suspects in 50,343 log entries. The apparent "deploy-service crash at 17:23:11Z" was actually a deliberate manual stop by another operator at 17:05:04Z, which truncated mid-run scenarios and explains the 38 simultaneous failures. PR 1853's change surface is absent from all stack traces. |
Verdict rationale
The OQ test stage reports FAIL (38 of 55 retried jobs failed both attempts), but log-analyzer's investigation establishes that the failures are not attributable to this PR. CF audit events show the deploy-service app was deliberately stopped by another operator at 17:05:04Z — 16 minutes into the 41-minute window — and that stop truncated cloud-logging ingest at 17:05:09Z. There are zero app.crash events on 2026-06-01, and none of the PR's change surface (Staging.healthCheckInterval, HealthCheckInfo, RawCloudProcess, CloudControllerRestClientImpl, StagingParametersParser) appears in any stack trace within the window. The single regression suspect (the "instances down at 17:23:11Z" signal) was decided UNLIKELY_CAUSED_BY_PR by log-analyzer, so the verdict is downgraded from FAIL to PASS_WITH_WARNINGS — but this run does not produce a valid pass/fail signal for the PR and must be re-run on a CF target that is not being shared concurrently.
Failed jobs / scenarios (38)
app-configurable-wait-after-stop, app-features-scenario, app-stack-update, app-staging-failure, app-with-buildpacks, application-hooks, application-hooks-with-extension-descriptors, async-service-bindings-scenario, async-service-keys-scenario, bg-deploy-after-token-expiration, bg-deploy-keep-existing-app-names, bg-deploy-stop-reorder, blue-green-deploy, blue-green-deploy-with-route, cleaners-and-clean-up-job, cts-blue-green, cts-chunked-deploy, cts-custom-idp-authentication, cts-multipart-file-uploads, cts-multiple-mtas-deploy, gacd-in-deployed-after, hook-target-app, http2-routes, keep-existing-application-attributes-scenario, namespace-multiple-deploys, occasional-message-for-non-finishing-task-execution, only-async-services-scenario, optional-mta-resources-scenario, passing-secrets-during-deployment, rollback-mta-scenario, selective-deployment-scenario, shared-private-domain-scenario, update-service-scenario, whitelisting-default-visibility-scenario, whitelisting-visibility-failure-scenario, whitelisting-visibility-in-all-orgs-scenario, whitelisting-visibility-in-current-org-space-scenario, zip-attacks-scenario
Pre-merge action items
- Re-run OQ on a non-shared CF target. This run was invalidated 16 minutes in by an out-of-band manual stop of
deploy-serviceby another operator. The OQ failures are not a verdict on this PR. - Commit the version-pin fixes that were uncommitted in the working tree before merge. Log-analyzer flagged three version-pin mismatches in the committed state of the feature branch —
multiapps-controller:multiapps.version,xsa-multiapps-controller:multiapps.version, andxsa-multiapps-controller:multiapps-controller.version. The deployed WAR was version-consistent only because of the on-disk pom edits made by the pusher; CI will not see those.
Log analysis summary
- Expected (test-driven): 115
- Infrastructure / transient: 50,228
- Potentially regression-related: 0
- Likely caused by PR: 0
- Unlikely caused by PR: 1
- Inconclusive: 0
- Version skew:
multiapps-controller:multiapps.version,xsa-multiapps-controller:multiapps.version,xsa-multiapps-controller:multiapps-controller.version(committed-state only; deployed build was version-consistent via uncommitted working-tree pom fixes)
Detailed log-analyzer findings
CF event record around the apparent "crash"
2026-06-01T17:05:04Z—audit.app.stopby actorradito3@mail.bg(deliberate manual stop)- Zero
app.crashoraudit.app.process.crashevents on 2026-06-01 - Last cloud-logging entry from the app:
2026-06-01T17:05:09Z(consistent with clean stop, not crash)
Per-suspect attribution
| # | Signature | Verdict | Reasoning |
|---|---|---|---|
| 1 | Deploy-service instances down at 17:23:11Z | UNLIKELY_CAUSED_BY_PR |
CF audit events show manual stop by radito3@mail.bg at 17:05:04Z. Zero crash events on 2026-06-01. Log ingest terminated cleanly at 17:05:09Z, consistent with graceful stop. No error pattern from the PR's change surface appears anywhere in the 50,343-entry log window. |
| 2 | Version-pin skew in committed feature-branch POMs (3 mismatches) | UNLIKELY_CAUSED_BY_PR (runtime) / process risk (pre-merge) |
Not a runtime regression for this run (uncommitted fixes were deployed). Becomes a real regression if PR is merged without committing the pom fixes. |
Strong attributions (LIKELY_CAUSED_BY_PR — investigate first)
None. No suspects in Bucket C. No stack frame from the PR's change surface appears in any log entry in the window.
Posted by pr-result-publisher via orchestrator (publisher's GitHub MCP was unavailable). Mode: oq. Generated 2026-06-01 UTC.
MTA Quality Report — cloudfoundry/multiapps-controller PR #1853Jira: LMCROSSITXSADEPLOY-3316 — Introduce Health check interval
Code Review
Security
SonarCloud
All SonarCloud / CodeQL / build check runs for head SHA Dependency CVEs✅ No new CVEs — this PR does not modify any dependency files ( Note: three of four subagents ( |



Summary
Introduces the
health-check-intervalMTA module parameter, achieving feature parity with Cloud Foundry's native v3 processHealthCheck.Data.intervalfield. The parameter is an integer (seconds) and follows the same kebab-case / omit-when-absent semantics as the otherhealth-check-*parameters already supported by the controller.The introduction was previously blocked because the underlying CF Java client did not expose the field; recent versions of the client now support it via
Data.Builder.interval(...), which is already used on the readiness side.Linked Jira:
LMCROSSITXSADEPLOY-3316— Introduce Health check intervalChanges
multiapps-controller-coreSupportedParameters— declares the newhealth-check-intervalparameter so the parameter validator accepts it onapplicationmodules.StagingParametersParser— parses the parameter throughPropertiesUtil(returnsnullwhen absent, preserving omit-when-absent semantics) and propagates it onto theStagingrequest object.multiapps-controller-clientStaging— adds the request-sidehealthCheckIntervalfield with builder support so the parsed value flows from parser to the CF API call.CloudProcess/RawCloudProcess— adds the response-sidehealthCheckIntervalfield and derives it from the v3 process payload when reading process state back from CF.HealthCheckInfo— includeshealthCheckIntervalin the equality / diff used byStagingApplicationAttributeUpdater, so that changing only the interval triggers a staging update (matching the behaviour of the other health-check fields).CloudControllerRestClientImpl#buildHealthCheck— passes the value intoData.Builder.interval(...)when assembling the v3 health-check object. Builder acceptsnulland skips serialisation in that case, so the on-the-wire shape is unchanged when the parameter is absent.Tests
Unit tests added / extended in this PR (separate test commit):
HealthCheckInfoTest(new)RawCloudProcessTest(new)StagingParametersParserTest(extended)health-check-interval(present and absent / null cases).Scoped Maven verification:
All targeted tests pass. Note for reviewers: the
multiapps-controller-coremodule currently has pre-existing test-compile issues in unrelated classes (HealthRetrieverTest,ApplicationConfigurationTest,MtaDescriptorPropertiesResolverTest) — these surface only on a full module test compile and are out of scope for this change.Acceptance criteria
Per the Jira ticket, one should be able to deploy an MTA with the snippet above and have the resulting CF app be configured with a 15-second health-check interval. The runtime path is exercised end-to-end by the parser → Staging →
buildHealthCheck→ CF Java clientData.Builder.intervalchain covered above.Backwards compatibility
When the parameter is absent,
PropertiesUtilreturnsnull,Data.Builder.interval(null)does not serialise the field, and the existingHealthCheckInfoequality semantics treat null-vs-null as equal. Existing MTAs and existing app states are therefore unaffected — no spurious staging updates are triggered on first deploy after the upgrade.