-
Notifications
You must be signed in to change notification settings - Fork 501
Optimizer does not successfully time out when query has a limit #1414
Description
Bug Report
Summary
When you have a complicated join with a LIMIT clause, then the optimizer fails to time out. Suppose that we have the tables from this ddl: https://github.com/oltpbenchmark/oltpbench/blob/master/src/com/oltpbenchmark/benchmarks/tpch/ddls/tpch-postgres-ddl.sql. Then if we run the following query:
select
s_acctbal
from
part,
partsupp,
supplier,
nation,
region,
customer,
orders,
lineitem
where
p_partkey = ps_partkey
and ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and n_nationkey = c_nationkey
and c_custkey = o_custkey
and o_orderkey = l_orderkeyThen we should expect that there are so many join orderings that DBMS optimizer will time out. In fact if you run that query, then it will time out, print the following in the console: [2020-12-23 22:05:03.397] [optimizer_logger] [warning] Optimize Loop ended prematurely: Optimizer task execution timed out, and return a result.
However if we run the following query (the only difference is the added LIMIT clause):
select
s_acctbal
from
part,
partsupp,
supplier,
nation,
region,
customer,
orders,
lineitem
where
p_partkey = ps_partkey
and ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and n_nationkey = c_nationkey
and c_custkey = o_custkey
and o_orderkey = l_orderkey
LIMIT 100;Then the optimizer WILL NOT time out.
The offending line is right here:
noisepage/src/optimizer/optimizer.cpp
Lines 154 to 156 in 8855886
| if (elapsed_time >= task_execution_timeout_ && root_group->HasExpressions(required_props)) { | |
| throw OPTIMIZER_EXCEPTION("Optimizer task execution timed out"); | |
| } |
Although
elapsed_time gets much greater than task_execution_timeout_, root_group->HasExpressions(required_props) will return false. The code for HasExpressions is here: noisepage/src/optimizer/group.cpp
Lines 72 to 75 in 8855886
| bool Group::HasExpressions(PropertySet *properties) const { | |
| const auto &it = lowest_cost_expressions_.find(properties); | |
| return (it != lowest_cost_expressions_.end()); | |
| } |
For this query,
lowest_cost_expressions_ is empty so HasExpressions will always return false. For the original query lowest_cost_expressions_ is not empty and contains some result. For both queries required_props is empty, which is a little weird to me, but I don't really know what required_props is, so this might be fine.
I don't know enough about the optimizer to know exactly why this is happening, but my guess at a very high conceptual level is, generating a physical plan for the LIMIT happens after we enumerate the join orderings. Since we don't have a physical plan for the limit, we can't time out during the join enumeration.
It's very likely that #1229 is suffering from this.
Environment
OS: Ubuntu (LTS) 20.04
Compiler: GCC 7.0+
CMake Profile: Debug
Jenkins/CI: N/A
Steps to Reproduce
- Compile with the following args: -DCMAKE_BUILD_TYPE=Debug -DTERRIER_USE_ASAN=ON
- Run terrier with parallel execution turned off terrier -parallel_execution=false
- Run the following: https://github.com/oltpbenchmark/oltpbench/blob/master/src/com/oltpbenchmark/benchmarks/tpch/ddls/tpch-postgres-ddl.sql
- Run:
select
s_acctbal
from
part,
partsupp,
supplier,
nation,
region,
customer,
orders,
lineitem
where
p_partkey = ps_partkey
and ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and n_nationkey = c_nationkey
and c_custkey = o_custkey
and o_orderkey = l_orderkey
LIMIT 100;Expected Behavior
noisepage=# select
noisepage-# s_acctbal
noisepage-# from
noisepage-# part,
noisepage-# partsupp,
noisepage-# supplier,
noisepage-# nation,
noisepage-# region,
noisepage-# customer,
noisepage-# orders,
noisepage-# lineitem
noisepage-# where
noisepage-# p_partkey = ps_partkey
noisepage-# and ps_suppkey = s_suppkey
noisepage-# and s_nationkey = n_nationkey
noisepage-# and n_regionkey = r_regionkey
noisepage-# and n_nationkey = c_nationkey
noisepage-# and c_custkey = o_custkey
noisepage-# and o_orderkey = l_orderkey
noisepage-# limit 100;
s_acctbal
-----------
(0 rows)
Actual Behavior
noisepage=# select
noisepage-# s_acctbal
noisepage-# from
noisepage-# part,
noisepage-# partsupp,
noisepage-# supplier,
noisepage-# nation,
noisepage-# region,
noisepage-# customer,
noisepage-# orders,
noisepage-# lineitem
noisepage-# where
noisepage-# p_partkey = ps_partkey
noisepage-# and ps_suppkey = s_suppkey
noisepage-# and s_nationkey = n_nationkey
noisepage-# and n_regionkey = r_regionkey
noisepage-# and n_nationkey = c_nationkey
noisepage-# and c_custkey = o_custkey
noisepage-# and o_orderkey = l_orderkey;
noisepage-# limit 100;
This keeps running for a long time without ever timing out. I've never stuck around long enough to confirm that it ever finishes but I'm assuming that it does eventually. As you remove more and more tables from the FROM clause then it will start to terminate and execute faster.