Describe the bug, including details regarding any error messages, version, and platform.
Problem
It is possible to register different Gandiva functions with the same alias and parameters but different return types, resulting in confusing function overloads.
For example, DATE_EXTRACTION_TRUNCATION_FNS in [cpp/src/gandiva/function_registry_datetime.cc] was invoked twice with the same SQL alias lists — once for extract* (returns int64) and once for date_trunc_* (returns the input date/timestamp type):
DATE_EXTRACTION_TRUNCATION_FNS(EXTRACT_SAFE_NULL_IF_NULL, extract)
DATE_EXTRACTION_TRUNCATION_FNS(TRUNCATE_SAFE_NULL_IF_NULL, date_trunc_)
As a result the registry contained four entries for day(...) where there should have been two:
int64 day(timestamp) → extractDay_timestamp
int64 day(date) → extractDay_date64
timestamp day(timestamp) → date_trunc_Day_timestamp
date day(date) → date_trunc_Day_date64
The same problem existed for every calendar-unit alias: year, month, quarter, week, weekofyear, yearweek, dayofmonth, hour, minute, second. Resolution behavior depended on the caller's inferred return type, which is not the SQL semantics anyone expects from day(timestamp_col).
FunctionRegistry::Add was silently allowing these registrations: unordered_map::emplace keeps the first entry and discards subsequent ones with no warning.
Component(s)
Gandiva
Describe the bug, including details regarding any error messages, version, and platform.
Problem
It is possible to register different Gandiva functions with the same alias and parameters but different return types, resulting in confusing function overloads.
For example, DATE_EXTRACTION_TRUNCATION_FNS in [cpp/src/gandiva/function_registry_datetime.cc] was invoked twice with the same SQL alias lists — once for extract* (returns int64) and once for date_trunc_* (returns the input date/timestamp type):
As a result the registry contained four entries for day(...) where there should have been two:
The same problem existed for every calendar-unit alias: year, month, quarter, week, weekofyear, yearweek, dayofmonth, hour, minute, second. Resolution behavior depended on the caller's inferred return type, which is not the SQL semantics anyone expects from day(timestamp_col).
FunctionRegistry::Add was silently allowing these registrations: unordered_map::emplace keeps the first entry and discards subsequent ones with no warning.
Component(s)
Gandiva