[common] Introduce RTreeFileIndex for query optimization#7919
[common] Introduce RTreeFileIndex for query optimization#7919xuzifu666 wants to merge 19 commits into
Conversation
| } else { | ||
| for (int i = 0; i < entryCount; i++) { | ||
| RTreeNode child = new RTreeNode(dimensions, maxEntries, false); | ||
| node.addChild(child); |
There was a problem hiding this comment.
RTreeNode.isLeaf is final. When the RTree constructor creates root, isLeaf=true. However, during deserialization, if the root of the tree is an internal node, the deserialization code will add children to the node with isLeaf=true. Afterwards, when searching, node. isLeaf() returns true and will look for leafRowIds instead of recursive children. The deserialized tree cannot be queried correctly at all.
There was a problem hiding this comment.
I changed private final boolean isLeaf → private boolean isLeaf, added setLeaf(boolean) method, then called setLeaf() during deserialization to correct the root node's leaf flag.
|
|
||
| RTreeNode newNode = new RTreeNode(dimensions, maxEntries, true); | ||
|
|
||
| int mid = entries.size() / 2; |
There was a problem hiding this comment.
This completely disregards spatial location. The correct R-Tree splitting should minimize the MBR overlap between two result nodes, otherwise a large number of nodes will "intersect" during queries, degenerating into linear scans.
There was a problem hiding this comment.
I did these changes:
- Implemented QuadraticSplit algorithm (2-step approach):
a. PickSeeds: Select two entries with maximum distance as initial seeds
b. Assign: Assign remaining entries to group with minimum bbox expansion - Implemented QuadraticSplitInternal: Same algorithm for internal node splitting
- Modified
RTree.javato use QuadraticSplit instead of linear splitting
|
|
||
| if (node.isLeaf()) { | ||
| for (Integer rowId : node.getLeafRowIds()) { | ||
| results.add(rowId); |
There was a problem hiding this comment.
Directly joined without checking if entry.bbox intersects with searchBox
There was a problem hiding this comment.
Changed as:
- Enhanced
RTree.javasearch() method:
a. Leaf nodes: Added per-entry checking entry.getBbox().intersects(searchBox)
b. Previously only checked node.getBoundingBox().intersects() - LeafEntry structure: Stores rowId + bbox for precision verification
| import org.apache.paimon.types.DataType; | ||
|
|
||
| /** The implementation of R-Tree file index. */ | ||
| public class RTreeFileIndex implements FileIndexer { |
There was a problem hiding this comment.
As a File Index, all data is already known at the time of writing, and the quality of the tree constructed by inserting each item is much lower than that of STR (Sort Tile Recursive) bulk loading
There was a problem hiding this comment.
My current alternative is:
- Implemented STRBulkLoader (Sort-Tile-Recursive algorithm):
a. Sort entries by current dimension
b. Partition into vertical tiles (~maxEntries per tile)
c. Recursively process each tile with next dimension
d. Build tree bottom-up - Enhanced RTreeFileIndexWriter:
a. write() method: Collect entries into list
b. serializedBytes(): Use STRBulkLoader for batch tree construction
|
@JingsongLi Thank you for the review! Your comments are very helpful, and I will refine them based on these issues. |
|
Thanks for the update. I found two blocking correctness/contract issues in the current head (
I reproduced both with a small contract test:
The existing tests pass because they mostly call Please fix these before merge:
|
Thank you for the suggestion!I made relevant changes, PTAL @leaves12138 |
|
Currently all comments should have be addressed in my view, could you help to review it again when you have a time~ @JingsongLi |
JingsongLi
left a comment
There was a problem hiding this comment.
Review
Design Issues
1. visitEqual semantics don't fit spatial queries.
The reader only overrides visitEqual. In practice, spatial queries are range queries (WHERE point WITHIN bbox), not equality checks. The predicate system's Equal function passes the literal through LeafPredicate.literals(), which gets serialized/deserialized via JSON — a double[] or BoundingBox object won't survive that roundtrip without custom serialization support in LeafPredicate.deserializeLiterals().
How does the user actually trigger this index from SQL? There's no predicate pushdown path that converts a spatial function (e.g., ST_Within, ST_Intersects) into a visitEqual call with a BoundingBox literal. This means the index is unreachable from the query engine as implemented.
2. Row ID uses int — overflows at 2B+ rows.
LeafEntry.rowId is int. A single Parquet file can exceed 2 billion rows. The serialization format also uses dos.writeInt(entry.getRowId()). This should be long to match Paimon's row addressing.
3. The entire R-Tree is deserialized into memory at read time.
RTreeFileIndexReader.deserializeRTree() reconstructs the full tree in Java heap with all node objects. For a data file with millions of points, this creates millions of RTreeNode, LeafEntry, BoundingBox objects — severe GC pressure. Compare with the bitmap index which uses a compact RoaringBitmap32 that stays serialized until needed.
Consider a zero-copy or on-disk traversal approach: serialize in BFS/DFS order, seek through the byte stream during search without materializing all nodes.
4. RTreeIndexResult.remain() is semantically wrong.
public boolean remain() {
return !getBitmap().isEmpty();
}remain() should return true when the file might contain matching rows (i.e., cannot be skipped). A non-empty bitmap correctly means "some rows match, keep the file." However, what remain() does NOT do is filter at row level — the bitmap is constructed but never exposed to the reader pipeline for row-level filtering. The RTreeIndexResult has a rowCount field but it's unused for anything meaningful.
If the goal is row-level filtering (skip specific rows within a file), the result should implement a BitmapIndexResult-compatible interface. Otherwise the R-Tree only provides file-level skip/remain decisions, which limits its value to files with very few matching rows.
5. Writer uses one-by-one insertion, then bulk-loads — confusing dual path.
The writer collects all entries in a List<LeafEntry>, then calls buildRTreeWithSTRBulkLoader(). Good — STR bulk loading is efficient. But the RTree.insert() method (quadratic split path) is unused in production and only exists for tests/benchmarks. This dead code adds maintenance burden. Consider removing the dynamic insertion path or clearly marking it as test-only.
Implementation Issues
6. BoundingBox.fromPoint() clones min but shares the reference for max.
public static BoundingBox fromPoint(double[] point) {
return new BoundingBox(point, point);
}The constructor clones both, so this is safe. But it creates two array copies for every point — significant allocation pressure during bulk insert. Consider a specialized point constructor that stores a single array.
7. QuadraticSplit is O(n²) and duplicates logic.
QuadraticSplit and QuadraticSplitInternal are nearly identical except for the entry type (LeafEntry vs RTreeNode). This should be a single generic class. More importantly, since the writer uses STR bulk loading, the quadratic split is only triggered during dynamic insertion (which is unused in production). Dead code.
8. No support for visitIn, visitLessThan, visitGreaterThan, visitBetween.
Spatial queries naturally map to range predicates. The reader doesn't override visitBetween or visitLessOrEqual/visitGreaterOrEqual, which means multi-dimensional range filters from the optimizer won't use this index.
9. STRBulkLoader.buildLevel() modifies the input list by sorting it.
entries.sort((a, b) -> Double.compare(...));This mutates the caller's list. Should sort a copy (the new ArrayList<>(entries) is done at the top level but not in recursive calls since entries.subList(i, end) is a view).
10. Serialization uses Java DataOutputStream — not portable.
Paimon's other indexes use MemorySegment-based serialization for alignment with the internal memory model. Using Java DataOutputStream/DataInputStream means no zero-copy reading, extra allocation, and platform-dependent behavior for edge cases.
Test Issues
11. Benchmark classes are in src/test but aren't JUnit tests.
RTreeBenchmark and RTreeVsLinearScanBenchmark have public static void main() methods but no @Test annotations. They won't run in CI. Either convert to JMH with proper @Benchmark annotations, or move to a dedicated benchmark module. RTreeJMHBenchmark exists but it's also in test — it needs the JMH dependency properly configured.
12. Too many test classes for a single PR.
8 test classes (BoundingBoxTest, RTreeCriticalFixTest, RTreeFileIndexTest, RTreeIntegrationTest, RTreeQuadraticSplitTest, RTreeSTRBulkLoaderTest, RTreeSerializationTest, RTreeSplitFixTest, RTreeTest) plus 3 benchmark classes. This is hard to review. Consolidate to 2-3 test classes: unit tests, integration test, and (optional) benchmark.
Summary
The core R-Tree algorithm implementation is reasonable, but the integration with Paimon's predicate/file-index framework is incomplete. Without a way to push spatial predicates from the query engine to visitEqual with proper literal serialization, this index cannot be triggered from SQL. I'd suggest:
- Define how spatial predicates flow from SQL → optimizer → file index reader
- Use
longfor row IDs - Avoid full in-memory deserialization — use on-disk traversal
- Remove dead code (dynamic insertion, quadratic split) from production
- Make
RTreeIndexResultinteroperable withBitmapIndexResultfor row-level filtering
Purpose
Paimon currently does not support rtree indexes. Refer to this paper https://postgis.net/docs/support/rtree.pdf for implementation instructions on how to implement this index.
The following are the relevant benchmark test results:
Hardware Configuration
Software Configuration
Test Parameters
Query Performance (10,000 queries)
Analysis by Dataset Size
Point Query vs Range Query
Search Performance on 100K Dataset:
Improvement vs Linear Scan:
Point query: 214× speedup
Range query: 182× speedup
Sequential Data Access Pattern
Tests
Run comparison benchmark
org.apache.paimon.fileindex.rtree.RTreeVsLinearScanBenchmark
Run detailed benchmark
org.apache.paimon.fileindex.rtree.RTreeBenchmark
###Here is a schematic diagram of the implementation:
2.1 Full Node Before Split
2.2 Split Process
2.3 Leaf Node Content Details
3.1 Point Query Flow
3.2 Range Query Flow
Example 1: Point Query
Example 2: Range Query
Memory Layout