Skip to content

Add Yezzey submodule to gpcontrib#1752

Open
leborchuk wants to merge 2 commits into
apache:mainfrom
leborchuk:AddYezzey
Open

Add Yezzey submodule to gpcontrib#1752
leborchuk wants to merge 2 commits into
apache:mainfrom
leborchuk:AddYezzey

Conversation

@leborchuk
Copy link
Copy Markdown
Contributor

Yezzey is an open-source extension for Apache Cloudberry and Greenplum 6 that transparently offloads Append-Only (AO/AOCO) table data to S3-compatible object storage. Inspired by Snowflake and AnyBlob, it extends the storage manager (smgr) so reads and writes go to S3 instead of local disk, keeping the user interface unchanged. A companion YProxy service acts as an I/O scheduler, managing connection pooling and request prioritization to prevent S3 throttling. Data is PGP-encrypted during upload. Benchmarks show only 10–43% query slowdown versus local storage, far outperforming PXF, making it ideal for cost-effective cold-data tiering.

The main feature of Yezzey - you don't need to change tables or code; just launch yezzey_define_offload_policy and move the data to S3. In this way, you can unload your cluster using the available disk space.

Currently, it is widely used on Greenplum 6 instances, and the goal is to provide users with the same interface in Cloudberry, so they can seamlessly migrate to Cloudberry.

We placed Yezzey as a submodule, as we believe one day we will replace all outdated solutions like AO/AOCO/Yezzey with PAX. However, that has not happened yet and we still need Yezzey.

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


Yezzey is an open-source extension for Apache Cloudberry and
Greenplum 6 that transparently offloads Append-Only (AO/AOCO)
table data to S3-compatible object storage. Inspired by Snowflake
and AnyBlob, it extends the storage manager (smgr) so reads and
writes go to S3 instead of local disk, keeping the user interface
unchanged. A companion YProxy service acts as an I/O scheduler,
managing connection pooling and request prioritization to prevent
S3 throttling. Data is PGP-encrypted during upload. Benchmarks
show only 10–43% query slowdown versus local storage, far
outperforming PXF, making it ideal for cost-effective cold-data
tiering.

The main feature of Yezzey - you don't need to change tables or code;
just launch yezzey_define_offload_policy and move the data to S3.
In this way, you can unload your cluster using the available disk space.

Currently, it is widely used on Greenplum 6 instances, and the goal is
to provide users with the same interface in Cloudberry, so they can
seamlessly migrate to Cloudberry.

We placed Yezzey as a submodule, as we believe one day we will replace
all outdated solutions like AO/AOCO/Yezzey with PAX. However, that has
not happened yet and we still need Yezzey.
@leborchuk
Copy link
Copy Markdown
Contributor Author

It's the part of our roadmap, we discussed it #868

See the item Support Compute/Storage decouple by introducing Yezzey

@leborchuk leborchuk marked this pull request as ready for review May 19, 2026 07:29
Copilot AI review requested due to automatic review settings May 19, 2026 07:29
@tuhaihe
Copy link
Copy Markdown
Member

tuhaihe commented May 19, 2026

We also need to add the yezzey license info to the licenses dir and LICENSE, like other submodules. FYI.

  • pom.xml.

@tuhaihe
Copy link
Copy Markdown
Member

tuhaihe commented May 19, 2026

For managing the new submodule, we can introduce it in the same way discussed here: #1084 (review)

This comment was marked as off-topic.

@leborchuk
Copy link
Copy Markdown
Contributor Author

We also need to add the yezzey license info to the licenses dir and LICENSE, like other submodules. FYI.

  • pom.xml.

fixed

@leborchuk
Copy link
Copy Markdown
Contributor Author

For managing the new submodule, we can introduce it in the same way discussed here: #1084 (review)

Yes, I did as was described. The issue is that there is no stored tag information and the tag is an ephemeral entity shown only in the git submodule output. See

xifos@xifos-dev-jammy:~/git/cloudberry-leborchuk$ git submodule status
 0da57b85cf23e48d0e515f58c65a25425dbde012 contrib/pax_storage/src/cpp/contrib/googlebench (v1.9.2-3-g0da57b8)
 52204f78f94d7512df1f0f3bea1d47437a2c3a58 contrib/pax_storage/src/cpp/contrib/googletest (release-1.8.0-3536-g52204f78)
 3a58301067bbc03da89ae5a51b3e05b7da719d38 contrib/pax_storage/src/cpp/contrib/tabulate (v1.3-51-g3a58301)
 61c03f62b370b685b7994830b570a88d05ba15ab dependency/yyjson (0.10.0-20-g61c03f6)
 a09ea700d32bab83325aff9ff34d0582e50e3997 gpcontrib/gpcloud/test/googletest (release-1.8.0-2358-ga09ea700)
 0d88f66a5fd0dba82681eef5929529cb153cb325 gpcontrib/yezzey (1.8.8)

Here yezzey linked with 1.8.8 tag but the link info is still commit hash. If I delete tag git submodule just shows commit hash without tag info

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants