Add Yezzey submodule to gpcontrib#1752
Conversation
Yezzey is an open-source extension for Apache Cloudberry and Greenplum 6 that transparently offloads Append-Only (AO/AOCO) table data to S3-compatible object storage. Inspired by Snowflake and AnyBlob, it extends the storage manager (smgr) so reads and writes go to S3 instead of local disk, keeping the user interface unchanged. A companion YProxy service acts as an I/O scheduler, managing connection pooling and request prioritization to prevent S3 throttling. Data is PGP-encrypted during upload. Benchmarks show only 10–43% query slowdown versus local storage, far outperforming PXF, making it ideal for cost-effective cold-data tiering. The main feature of Yezzey - you don't need to change tables or code; just launch yezzey_define_offload_policy and move the data to S3. In this way, you can unload your cluster using the available disk space. Currently, it is widely used on Greenplum 6 instances, and the goal is to provide users with the same interface in Cloudberry, so they can seamlessly migrate to Cloudberry. We placed Yezzey as a submodule, as we believe one day we will replace all outdated solutions like AO/AOCO/Yezzey with PAX. However, that has not happened yet and we still need Yezzey.
|
It's the part of our roadmap, we discussed it #868 See the item |
|
We also need to add the
|
|
For managing the new submodule, we can introduce it in the same way discussed here: #1084 (review) |
fixed |
Yes, I did as was described. The issue is that there is no stored tag information and the tag is an ephemeral entity shown only in the Here yezzey linked with |
Yezzey is an open-source extension for Apache Cloudberry and Greenplum 6 that transparently offloads Append-Only (AO/AOCO) table data to S3-compatible object storage. Inspired by Snowflake and AnyBlob, it extends the storage manager (smgr) so reads and writes go to S3 instead of local disk, keeping the user interface unchanged. A companion YProxy service acts as an I/O scheduler, managing connection pooling and request prioritization to prevent S3 throttling. Data is PGP-encrypted during upload. Benchmarks show only 10–43% query slowdown versus local storage, far outperforming PXF, making it ideal for cost-effective cold-data tiering.
The main feature of Yezzey - you don't need to change tables or code; just launch yezzey_define_offload_policy and move the data to S3. In this way, you can unload your cluster using the available disk space.
Currently, it is widely used on Greenplum 6 instances, and the goal is to provide users with the same interface in Cloudberry, so they can seamlessly migrate to Cloudberry.
We placed Yezzey as a submodule, as we believe one day we will replace all outdated solutions like AO/AOCO/Yezzey with PAX. However, that has not happened yet and we still need Yezzey.
Fixes #ISSUE_Number
What does this PR do?
Type of Change
Breaking Changes
Test Plan
make installcheckmake -C src/test installcheck-cbdb-parallelImpact
Performance:
User-facing changes:
Dependencies:
Checklist
Additional Context
CI Skip Instructions