Skip to content

PR history may show revived ProwJob attempts as duplicate build IDs #1181

@wallrj

Description

@wallrj

Summary

I think the cert-manager Prow deployment may be surfacing revived attempts of the same ProwJob as separate build IDs in Deck / PR history.

At first glance this looks like duplicate webhook-triggered jobs, but the artefacts for cert-manager/cert-manager#8794 suggest it is actually the same ProwJob UID being shown more than once after an interruption.

Screenshot from PR history at the time of writing:

PR history showing paired e2e-v1-35 entries

PR history page:

Context links:

Why this looks suspicious

For pull-cert-manager-master-e2e-v1-35 on cert-manager/cert-manager#8794, the same head revision is often shown as a pair of nearby build IDs.

For example, after /retest on 2026-05-22, PR history showed:

  • 2057698540381016064
  • 2057715292406026240

These do not look like two separate ProwJobs.

Evidence that both artefact paths belong to the same ProwJob

From 2057698540381016064/prowjob.json:

{
  "metadata": {
    "name": "8f0b8624-9f3f-46bd-a8c1-7cf27035d5b2",
    "uid": "91cb38d6-256a-4784-94ed-42e6010bc00a",
    "creationTimestamp": "2026-05-22T05:42:16Z",
    "labels": {
      "prow.k8s.io/build-id": "2057698540381016064"
    },
    "generation": 2
  }
}

From 2057715292406026240/prowjob.json:

{
  "metadata": {
    "name": "8f0b8624-9f3f-46bd-a8c1-7cf27035d5b2",
    "uid": "91cb38d6-256a-4784-94ed-42e6010bc00a",
    "creationTimestamp": "2026-05-22T05:42:16Z",
    "labels": {
      "prow.k8s.io/build-id": "2057715292406026240"
    },
    "generation": 8
  }
}

So the name, uid, and creationTimestamp are the same, but the build-id changed.

The second artefact path then shows a pre-emption-style interruption

From 2057715292406026240/podinfo.json:

{
  "type": "DisruptionTarget",
  "status": "True",
  "reason": "PreemptionByScheduler",
  "message": "default-scheduler: preempting to accommodate a higher priority pod"
}

The two started.json files also show two different attempt start times for that same logical ProwJob:

The first attempt also has a sidecar interrupt shortly before the second attempt appears:

Why I think this may be related to cert-manager's custom controller image

cert-manager is not using the stock upstream prow-controller-manager image.

In this repo:

That image is built from:

  • https://github.com/inteon/prow.git
  • branch: option_recreate_prowjob_on_termination

and the README says it includes the changes from:

  • kubernetes-sigs/prow#117 — “Revive prowjob when node is terminated (enabled by default)”

So I wonder whether what Deck is showing here is:

  1. an initial attempt,
  2. the custom controller reviving the same ProwJob after interruption / node loss,
  3. a new build ID being assigned,
  4. PR history then showing both build IDs side-by-side.

Questions

  1. Is this paired-build-ID behaviour expected when using prow-controller-manager-spot?
  2. Should revived attempts of the same ProwJob be shown differently in PR history / Deck?
  3. Is the build ID supposed to change for a revived ProwJob with the same UID, or is that the bug?
  4. Could this explain why the PR history can look like duplicate presubmit triggers even when GitHub only triggered one ProwJob?

If helpful, I can gather a few more examples from other PRs, but cert-manager/cert-manager#8794 already seems to show the pattern fairly clearly.

Updated example from 2026-05-23

A newer rerun on the same PR now shows an even stronger version of the pattern.

Updated screenshot:

Updated PR history showing repeated build IDs for the same logical job

The relevant build IDs are:

The first and third artefact paths again point at the same logical ProwJob.

From 2058066701156618240/prowjob.json:

{
  "metadata": {
    "name": "0afc6fc5-b218-4d2b-90f1-9481883b6ba3",
    "uid": "b5331b16-8649-4b83-ae95-bcc37790cd0b",
    "creationTimestamp": "2026-05-23T06:05:13Z",
    "labels": {
      "prow.k8s.io/build-id": "2058066701156618240"
    },
    "generation": 2
  }
}

From 2058095845898719232/prowjob.json:

{
  "metadata": {
    "name": "0afc6fc5-b218-4d2b-90f1-9481883b6ba3",
    "uid": "b5331b16-8649-4b83-ae95-bcc37790cd0b",
    "creationTimestamp": "2026-05-23T06:05:13Z",
    "labels": {
      "prow.k8s.io/build-id": "2058095845898719232"
    },
    "generation": 10
  }
}

So again the name, uid, and creationTimestamp stayed the same while the build ID changed.

The intermediate artefact path, 2058068411522486272/build-log.txt, then aborted while hanging in the samplewebhook setup step:

/home/prow/go/src/github.com/cert-manager/cert-manager/_bin/tools/helm upgrade \
    --install \
    --wait \
    --namespace samplewebhook \
    --create-namespace \
    samplewebhook make/config/samplewebhook/chart >/dev/null

And the later artefact path, 2058095845898719232/build-log.txt, eventually failed after the full 2h timeout at the same step:

/home/prow/go/src/github.com/cert-manager/cert-manager/_bin/tools/helm upgrade \
    --install \
    --wait \
    --namespace samplewebhook \
    --create-namespace \
    samplewebhook make/config/samplewebhook/chart >/dev/null
{"component":"entrypoint","msg":"Process did not finish before 2h0m0s timeout"}

So this looks less like a simple pair and more like:

  1. initial ProwJob artefact path,
  2. intermediate aborted/revived attempt,
  3. later/final artefact path for the same ProwJob UID.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions