merge: use repo_in_merge_bases for octopus up-to-date check#2110
Conversation
The octopus merge path checks whether each remote head is already an ancestor of HEAD by computing all merge-bases via repo_get_merge_bases() and comparing the first result's OID to the remote head. This is more expensive than necessary: repo_get_merge_bases() calls paint_down_to_common() with min_generation=0, performs the full STALE drain, and may run remove_redundant(), when all we need is a yes/no reachability answer. Replace this with repo_in_merge_bases(), which answers the is-ancestor question directly. When generation numbers are available, repo_in_merge_bases() uses can_all_from_reach() -- a DFS bounded by generation number that stops as soon as the target is found or ruled out, without entering paint_down_to_common() at all. Without generation numbers, it still benefits from a tighter min_generation floor. Signed-off-by: Kristofer Karlsson <krka@spotify.com>
989c1f7 to
bc97680
Compare
|
/submit |
|
Submitted as pull.2110.git.1778566286543.gitgitgadget@gmail.com To fetch this version into To fetch this version to local tag |
|
This patch series was integrated into seen via git@cdc64b6. |
|
This branch is now known as |
|
This patch series was integrated into seen via git@45bc638. |
|
This patch series was integrated into seen via git@8adfe4a. |
|
This patch series was integrated into seen via git@01296ef. |
|
There was a status update in the "New Topics" section about the branch The logic to determine that branches in an octopus merge are independent has been optimized. Comments? source: <pull.2110.git.1778566286543.gitgitgadget@gmail.com> |
|
This patch series was integrated into seen via git@bff691d. |
|
Derrick Stolee wrote on the Git mailing list (how to reply to this email): On 5/12/2026 2:11 AM, Kristofer Karlsson via GitGitGadget wrote:
> From: Kristofer Karlsson <krka@spotify.com>
>
> The octopus merge path checks whether each remote head is already
> an ancestor of HEAD by computing all merge-bases via
> repo_get_merge_bases() and comparing the first result's OID to
> the remote head. This is more expensive than necessary:
> repo_get_merge_bases() calls paint_down_to_common() with
> min_generation=0, performs the full STALE drain, and may run
> remove_redundant(), when all we need is a yes/no reachability
> answer.
>
> Replace this with repo_in_merge_bases(), which answers the
> is-ancestor question directly. When generation numbers are
> available, repo_in_merge_bases() uses can_all_from_reach() -- a
> DFS bounded by generation number that stops as soon as the target
> is found or ruled out, without entering paint_down_to_common() at
> all. Without generation numbers, it still benefits from a tighter
> min_generation floor.
>
> Signed-off-by: Kristofer Karlsson <krka@spotify.com>
> ---
> merge: use repo_in_merge_bases for octopus up-to-date check
>
> While reviewing callers of repo_get_merge_bases() for a different patch,
> I noticed the octopus up-to-date loop in builtin/merge.c computes full
> merge-bases only to check whether each remote head is an ancestor of
> HEAD.
>
> The existing code calls repo_get_merge_bases(), takes the first result,
> frees the list, and compares the OID to the remote head. This is
> equivalent to an is-ancestor check, which repo_in_merge_bases() answers
> directly.
>
> Using repo_in_merge_bases() simplifies the code (-14/+4 lines) and
> avoids unnecessary work: with generation numbers it uses
> can_all_from_reach() instead of paint_down_to_common(), and without
> generation numbers it still benefits from a tighter min_generation
> floor. In practice this only matters for octopus merges on repos with
> deep history, so the main value here is the simplification.
The code change looks right to me. Do you have any performance numbers
to share? Or was this motivated mostly as an opportunity for code cleanup?
Thanks,
-Stolee
|
|
User |
|
Kristofer Karlsson wrote on the Git mailing list (how to reply to this email): Good question! No, this was intended only as a code cleanup and
semantic simplification.
The code path seems like an edge case, so benchmarking it did not seem
worthwhile.
The main win as I see it is clearer semantics for what it's doing and
the optimization is just a bonus.
Checking if repo_get_merge_bases(HEAD, J) == J is better expressed as
repo_in_merge_bases(J, HEAD)
repo_in_merge_bases should probably be renamed to something like
repo_is_ancestor, since its comment says:
Is "commit" an ancestor of (i.e. reachable from) the "reference"?
but I can understand that it may be painful to rename it in practice.
If I do a mental rename, this becomes simpler to reason about:
Checking if repo_get_merge_bases(HEAD, J) == J is better expressed as
repo_is_ancestor(J, HEAD)
- Kristofer
On Mon, 18 May 2026 at 14:20, Derrick Stolee <stolee@gmail.com> wrote:
>
> On 5/12/2026 2:11 AM, Kristofer Karlsson via GitGitGadget wrote:
> > From: Kristofer Karlsson <krka@spotify.com>
> >
> > The octopus merge path checks whether each remote head is already
> > an ancestor of HEAD by computing all merge-bases via
> > repo_get_merge_bases() and comparing the first result's OID to
> > the remote head. This is more expensive than necessary:
> > repo_get_merge_bases() calls paint_down_to_common() with
> > min_generation=0, performs the full STALE drain, and may run
> > remove_redundant(), when all we need is a yes/no reachability
> > answer.
> >
> > Replace this with repo_in_merge_bases(), which answers the
> > is-ancestor question directly. When generation numbers are
> > available, repo_in_merge_bases() uses can_all_from_reach() -- a
> > DFS bounded by generation number that stops as soon as the target
> > is found or ruled out, without entering paint_down_to_common() at
> > all. Without generation numbers, it still benefits from a tighter
> > min_generation floor.
> >
> > Signed-off-by: Kristofer Karlsson <krka@spotify.com>
> > ---
> > merge: use repo_in_merge_bases for octopus up-to-date check
> >
> > While reviewing callers of repo_get_merge_bases() for a different patch,
> > I noticed the octopus up-to-date loop in builtin/merge.c computes full
> > merge-bases only to check whether each remote head is an ancestor of
> > HEAD.
> >
> > The existing code calls repo_get_merge_bases(), takes the first result,
> > frees the list, and compares the OID to the remote head. This is
> > equivalent to an is-ancestor check, which repo_in_merge_bases() answers
> > directly.
> >
> > Using repo_in_merge_bases() simplifies the code (-14/+4 lines) and
> > avoids unnecessary work: with generation numbers it uses
> > can_all_from_reach() instead of paint_down_to_common(), and without
> > generation numbers it still benefits from a tighter min_generation
> > floor. In practice this only matters for octopus merges on repos with
> > deep history, so the main value here is the simplification.
>
> The code change looks right to me. Do you have any performance numbers
> to share? Or was this motivated mostly as an opportunity for code cleanup?
>
> Thanks,
> -Stolee
>
> |
|
User |
While reviewing callers of repo_get_merge_bases() for a different patch, I noticed the octopus up-to-date loop in builtin/merge.c computes full merge-bases only to check whether each remote head is an ancestor of HEAD.
The existing code calls repo_get_merge_bases(), takes the first result, frees the list, and compares the OID to the remote head. This is equivalent to an is-ancestor check, which repo_in_merge_bases() answers directly.
Using repo_in_merge_bases() simplifies the code (-14/+4 lines) and avoids unnecessary work: with generation numbers it uses can_all_from_reach() instead of paint_down_to_common(), and without generation numbers it still benefits from a tighter min_generation floor. In practice this only matters for octopus merges on repos with deep history, so the main value here is the simplification.
The comment "Here we have to calculate the individual merge_bases again" dates from 2008 (1c7b76b, "Build in merge"). At the time, in_merge_bases() was the same cost as computing merge bases. Stolee's 2018 generation number work (d7c1ec3) and the later switch to can_all_from_reach (6cc0174) made it significantly cheaper, but this call site was never updated.
cc: Derrick Stolee stolee@gmail.com
cc: Kristofer Karlsson krka@spotify.com