[PoC] perf: optimize group-only group-by case for primitive cases (clickbench q4)#22145
[PoC] perf: optimize group-only group-by case for primitive cases (clickbench q4)#22145waynexia wants to merge 6 commits into
Conversation
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
|
run benchmarks |
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
2 similar comments
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
|
run benchmarks |
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
2 similar comments
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
|
run benchmarks |
|
Hi @waynexia, thanks for the request (#22145 (comment)). Only whitelisted users can trigger benchmarks. Allowed users: Dandandan, Fokko, Jefffrey, Omega359, adriangb, alamb, asubiotto, brunal, buraksenn, cetra3, codephage2020, coderfender, comphead, erenavsarogullari, etseidl, friendlymatthew, gabotechs, geoffreyclaude, grtlr, haohuaijin, jonathanc-n, kevinjqliu, klion26, kosiew, kumarUjjawal, kunalsinghdadhwal, liamzwbao, mbutrovich, mkleen, mzabaluev, neilconway, rluvaton, sdf-jkl, timsaucer, xudong963, zhuqi-lucas. File an issue against this benchmark runner |
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing q4 (65980f0) to 7f2f78d (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing q4 (65980f0) to 7f2f78d (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing q4 (65980f0) to 7f2f78d (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
Hmmm, the result does not seem good 🤔 Let me dig deeper |
Seems like the query is faster though (and memory is ~halved) so seems like a great optimization - I expect it for higher cardinality |
|
run benchmark clickbench_partitioned clickbench_extended |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing q4 (65980f0) to 7f2f78d (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing q4 (65980f0) to 7f2f78d (merge-base) diff using: clickbench_extended File an issue against this benchmark runner |
| /// is obvious in high cardinality group by situation. | ||
| /// More details can see: | ||
| /// <https://github.com/apache/datafusion/issues/15961> | ||
| map: HashTable<(usize, u64)>, |
There was a problem hiding this comment.
I found HashTable<(usize, T::Native)> also to be a bit faster than this current approach (and also saving memory). We can build the Vec<T::Native> when emitting data.
Could be part of some future PR.
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_extended — base (merge-base)
clickbench_extended — branch
File an issue against this benchmark runner |
Some queries regress a lot... |
This is likely noise, best is to look at the lowest reported times in those cases. |
|
I rerun it use an aws c6a.4xlarge instance. Improvement on q4 is reproduced, but still has lots of regressions and some of them seem to be real. I'll verify them on my local environment next Details |
|
rerun another round and those regression disappear 😼 |
Which issue does this PR close?
Rationale for this change
This is an (agent made) PoC branch for clickbench q4
SELECT COUNT(DISTINCT UserID) FROM hits;. In my test environment it's about 25% faster thanmain.The basic idea is to skip maintaining group-id when there is no requirement of it, e.g., for group-by cases without any accumulator.
Other findings on this experiment:
SELECT COUNT(DISTINCT SearchPhrase) FROM hits;where the key is string type, the group calculation is not the critical pathWhat changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?