Skip to content

[flink] Support blob compaction for data evolution tables#7932

Merged
JingsongLi merged 18 commits into
apache:masterfrom
leaves12138:flink-blob-data-evolution-compact
May 22, 2026
Merged

[flink] Support blob compaction for data evolution tables#7932
JingsongLi merged 18 commits into
apache:masterfrom
leaves12138:flink-blob-data-evolution-compact

Conversation

@leaves12138
Copy link
Copy Markdown
Contributor

Purpose

Enable Flink data-evolution compaction to plan and execute blob compaction tasks.

This implements DataEvolutionCompactTask support for blobTask by rewriting dedicated blob files with MultipleBlobFileWriter. It also enables the Flink data-evolution compact source to plan blob tasks.

The reader path now also supports blob-only merge groups, which is required when a data-evolution blob task contains multiple blob fields without the corresponding normal data file in the same split.

Tests

  • mvn -pl paimon-core -DfailIfNoTests=false -Dtest=MultipleBlobTableTest#testDataEvolutionBlobCompaction test
  • mvn -pl paimon-flink/paimon-flink-common -DfailIfNoTests=false -Dtest=BlobTableITCase#testBlobCompaction+testMultipleBlobCompaction test
  • mvn -pl paimon-core -DfailIfNoTests=false -Dtest=MultipleBlobTableTest test
  • mvn -pl paimon-flink/paimon-flink-common -DfailIfNoTests=false -Dtest=BlobTableITCase test

@leaves12138 leaves12138 marked this pull request as draft May 22, 2026 04:02
@leaves12138 leaves12138 marked this pull request as ready for review May 22, 2026 06:36
Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Should blob compaction be controlled by option
  2. In CompactPlanner and DataEvolution CompactTask, the logic for parsing the field ID of blob files is basically the same.

.rawConvertible(false)
.build();
RecordReader<InternalRow> reader =
store.newDataEvolutionRead().withReadType(blobWriteType).createReader(dataSplit);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use blob-as-descriptor to read blob.

@leaves12138 leaves12138 force-pushed the flink-blob-data-evolution-compact branch from 3d2bd6e to ec28e95 Compare May 22, 2026 08:50
Comment thread docs/generated/core_configuration.html Outdated
<td>The TTL in rocksdb index for cross partition upsert (primary keys not contain all partition fields), this can avoid maintaining too many indexes and lead to worse and worse performance, but please note that this may also cause data duplication.</td>
</tr>
<tr>
<td><h5>data-evolution.compaction.blob.enabled</h5></td>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blob-compaction.enabled

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@JingsongLi
Copy link
Copy Markdown
Contributor

+1

@JingsongLi JingsongLi merged commit ba967e1 into apache:master May 22, 2026
12 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants