Skip to content

CLDSRV-908: CopyObject handle checksums#6176

Open
leif-scality wants to merge 5 commits into
development/9.4from
improvement/CLDSRV-908-copy-object-handle-checksums
Open

CLDSRV-908: CopyObject handle checksums#6176
leif-scality wants to merge 5 commits into
development/9.4from
improvement/CLDSRV-908-copy-object-handle-checksums

Conversation

@leif-scality
Copy link
Copy Markdown
Contributor

@leif-scality leif-scality commented May 27, 2026

  • Forward a src object checksum to the dest object
  • Recompute the checksum when required (x-amz-checksum-algorithm header set, src object MPU with COMPOSITE checksum, ...)
  • Compute a CRC64NMVE for the dest object if the src object has no checksum
  ┌─────┬─────────────────────┬────────────────────────────────────┬──────────────────────────────────────────────────────────────────┐
  │  #  │   Source checksum   │ x-amz-checksum-algorithm requested │                            Recompute?                            │
  ├─────┼─────────────────────┼────────────────────────────────────┼──────────────────────────────────────────────────────────────────┤
  │ 1   │ none                │ none                               │ Yes (Compute default CRC64NVME)                                  │
  ├─────┼─────────────────────┼────────────────────────────────────┼──────────────────────────────────────────────────────────────────┤
  │ 2   │ FULL_OBJECT, algo X │ none                               │ No — propagate as-is                                             │
  ├─────┼─────────────────────┼────────────────────────────────────┼──────────────────────────────────────────────────────────────────┤
  │ 3   │ FULL_OBJECT, algo X │ X (same algo)                      │ No — propagate as-is                                             │
  ├─────┼─────────────────────┼────────────────────────────────────┼──────────────────────────────────────────────────────────────────┤
  │ 4   │ FULL_OBJECT, algo X │ Y (different algo)                 │ Yes                                                              │
  ├─────┼─────────────────────┼────────────────────────────────────┼──────────────────────────────────────────────────────────────────┤
  │ 5   │ COMPOSITE, algo X   │ none                               │ Yes (can't propagate a MPU format digest to a single-object dest)│
  ├─────┼─────────────────────┼────────────────────────────────────┼──────────────────────────────────────────────────────────────────┤
  │ 6   │ COMPOSITE, algo X   │ X (same algo)                      │ Yes                                                              │
  ├─────┼─────────────────────┼────────────────────────────────────┼──────────────────────────────────────────────────────────────────┤
  │ 7   │ COMPOSITE, algo X   │ Y (different algo)                 │ Yes                                                              │
  ├─────┼─────────────────────┼────────────────────────────────────┼──────────────────────────────────────────────────────────────────┤
  │ 8   │ none                │ any algo                           │ Yes (source has no digest, must compute)                         │
  ├─────┼─────────────────────┼────────────────────────────────────┼──────────────────────────────────────────────────────────────────┤
  │ 9   │ any                 │ source is 0-byte                   │ Yes (special-cased — empty-bytes digest, no streaming)           │
  └─────┴─────────────────────┴────────────────────────────────────┴──────────────────────────────────────────────────────────────────┘

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented May 27, 2026

Hello leif-scality,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented May 27, 2026

Incorrect fix version

The Fix Version/s in issue CLDSRV-908 contains:

  • None

Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:

  • 9.4.0

Please check the Fix Version/s of CLDSRV-908, or the target
branch of this pull request.

Comment thread lib/api/objectCopy.js Dismissed
Comment thread tests/functional/aws-node-sdk/test/object/objectCopy.js Outdated
@claude
Copy link
Copy Markdown

claude Bot commented May 27, 2026

LGTM — clean implementation of checksum propagation and recompute on CopyObject. Stream handling (jsutil.once guards, Azure per-part passthrough, error propagation) and the _shouldRecomputeChecksum decision logic are solid. One minor test style issue flagged inline.

- require() inside describe/withV4 callback in functional test — move to top of file

Review by Claude Code

@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2026

Codecov Report

❌ Patch coverage is 83.49515% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.26%. Comparing base (580d648) to head (720f686).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
lib/api/objectCopy.js 81.91% 17 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

Files with missing lines Coverage Δ
lib/api/apiUtils/integrity/validateChecksums.js 99.08% <100.00%> (+0.03%) ⬆️
lib/api/objectCopy.js 90.33% <81.91%> (-1.17%) ⬇️

... and 1 file with indirect coverage changes

@@                 Coverage Diff                 @@
##           development/9.4    #6176      +/-   ##
===================================================
+ Coverage            85.25%   85.26%   +0.01%     
===================================================
  Files                  208      208              
  Lines                13919    14012      +93     
===================================================
+ Hits                 11867    11948      +81     
- Misses                2052     2064      +12     
Flag Coverage Δ
file-ft-tests 68.86% <72.81%> (+0.04%) ⬆️
kmip-ft-tests 28.09% <23.30%> (-0.03%) ⬇️
mongo-v0-ft-tests 70.00% <72.81%> (+0.02%) ⬆️
mongo-v1-ft-tests 70.01% <72.81%> (-0.06%) ⬇️
multiple-backend 36.40% <5.82%> (-0.21%) ⬇️
sur-tests 35.29% <10.67%> (-1.08%) ⬇️
sur-tests-inflights 37.16% <10.67%> (-0.18%) ⬇️
unit 72.25% <83.49%> (+0.23%) ⬆️
utapi-v2-tests 34.52% <23.30%> (-0.26%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@leif-scality leif-scality force-pushed the improvement/CLDSRV-908-copy-object-handle-checksums branch from 5ca489a to 57d9894 Compare May 28, 2026 09:41
@claude
Copy link
Copy Markdown

claude Bot commented May 28, 2026

LGTM

Review by Claude Code

@leif-scality leif-scality force-pushed the improvement/CLDSRV-908-copy-object-handle-checksums branch from 57d9894 to 5bd2262 Compare May 28, 2026 13:29
Comment thread lib/api/objectCopy.js
Comment thread lib/api/objectCopy.js
@claude
Copy link
Copy Markdown

claude Bot commented May 28, 2026

  • Stream leak on source read error: when sourceStream errors in the recompute path, checksumStream is not destroyed, so data.put (reading from it) may hang indefinitely
    - Destroy checksumStream in the sourceStream error handler
    - Data orphan on copy-to-self with recompute: the recompute path writes new data via data.put, but deleteExistingData skips cleanup when sourceIsDestination is true, orphaning the old data locations
    - Track whether new data was written in the recompute path and allow cleanup in deleteExistingData accordingly

    Review by Claude Code

@leif-scality leif-scality force-pushed the improvement/CLDSRV-908-copy-object-handle-checksums branch from 5bd2262 to 8b2b196 Compare May 28, 2026 15:13
@claude
Copy link
Copy Markdown

claude Bot commented May 28, 2026

LGTM

Review by Claude Code

Propagate the source FULL_OBJECT checksum when the algorithm matches the
request (or no algorithm was requested); otherwise stream source bytes
through a ChecksumTransform to compute a fresh digest before writing the
destination.
@leif-scality leif-scality force-pushed the improvement/CLDSRV-908-copy-object-handle-checksums branch from 8b2b196 to 720f686 Compare May 28, 2026 16:28
@claude
Copy link
Copy Markdown

claude Bot commented May 28, 2026

LGTM

Review by Claude Code

'The x-amz-checksum-type header can only be used ' + 'with the x-amz-checksum-algorithm header.',
);

// TODO: Update with 'MD5', 'SHA512', 'XXHASH128', 'XXHASH3', 'XXHASH64' when they are introduced.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can add the ticket number in the comment if you have a ticket for this TODO

Comment thread lib/api/objectCopy.js
* @returns {boolean}
*/
function _shouldRecomputeChecksum(headers, sourceObjMD) {
const requestedAlgo = headers['x-amz-checksum-algorithm'] && headers['x-amz-checksum-algorithm'].toLowerCase();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use optional chaining operator in cloudserver.

Suggested change
const requestedAlgo = headers['x-amz-checksum-algorithm'] && headers['x-amz-checksum-algorithm'].toLowerCase();
const requestedAlgo = headers['x-amz-checksum-algorithm']?.toLowerCase();

Maybe this can be used in other places as well to simplify

Comment thread lib/api/objectCopy.js
Comment on lines +67 to +68
if (err) {
done(err);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this error happens, it could be any error and the perPart is still piped.

So it could theoretically receive data from azure and continue to stream during a short time window until the final callback calls passthrough.destroy and unpipes the source perPart.

Maybe the perPart should be destroyed here to stop immediately the streaming

Comment thread lib/api/objectCopy.js
// into the master passthrough and use its 'end' as the completion
// signal — same pattern arsenal's data.copyObject uses.
const perPart = new PassThrough();
perPart.once('error', done);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider encapsulating the error to include some part details in the error for better troubleshooting if the error is ever logged somewhere else in the streaming path.

Comment thread lib/api/objectCopy.js
// and masterKeyId stored properly in metadata
if (sourceIsDestination && storeMetadataParams.locationMatch
&& !isVersionedObj && !needsEncryption) {
&& !isVersionedObj && !needsEncryption && !shouldRecomputeChecksum) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't skip, you'll trigger a data GET + PUT + DELETE (for previous location).

This only for checksum recomputation. Should you rather define another path where if it's only recompute checksum, you allow only the GET to stream into the ChecksumTransform stream and then you discard the end of the data stream, and avoid having to do a data PUT + DELETE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants