Skip to content
This repository was archived by the owner on Feb 20, 2023. It is now read-only.
This repository was archived by the owner on Feb 20, 2023. It is now read-only.

Optimizer does not successfully time out when query has a limit #1414

@jkosh44

Description

@jkosh44

Bug Report

Summary

When you have a complicated join with a LIMIT clause, then the optimizer fails to time out. Suppose that we have the tables from this ddl: https://github.com/oltpbenchmark/oltpbench/blob/master/src/com/oltpbenchmark/benchmarks/tpch/ddls/tpch-postgres-ddl.sql. Then if we run the following query:

select
        s_acctbal
from
        part,
        partsupp,
        supplier,
        nation,
        region,
        customer,
        orders,
        lineitem
where
        p_partkey = ps_partkey
        and ps_suppkey = s_suppkey 
        and s_nationkey = n_nationkey
        and n_regionkey = r_regionkey
        and n_nationkey = c_nationkey
        and c_custkey = o_custkey
        and o_orderkey = l_orderkey

Then we should expect that there are so many join orderings that DBMS optimizer will time out. In fact if you run that query, then it will time out, print the following in the console: [2020-12-23 22:05:03.397] [optimizer_logger] [warning] Optimize Loop ended prematurely: Optimizer task execution timed out, and return a result.

However if we run the following query (the only difference is the added LIMIT clause):

select
        s_acctbal
from
        part,
        partsupp,
        supplier,
        nation,
        region,
        customer,
        orders,
        lineitem
where
        p_partkey = ps_partkey
        and ps_suppkey = s_suppkey 
        and s_nationkey = n_nationkey
        and n_regionkey = r_regionkey
        and n_nationkey = c_nationkey
        and c_custkey = o_custkey
        and o_orderkey = l_orderkey
LIMIT 100;

Then the optimizer WILL NOT time out.

The offending line is right here:

if (elapsed_time >= task_execution_timeout_ && root_group->HasExpressions(required_props)) {
throw OPTIMIZER_EXCEPTION("Optimizer task execution timed out");
}

Although elapsed_time gets much greater than task_execution_timeout_, root_group->HasExpressions(required_props) will return false. The code for HasExpressions is here:
bool Group::HasExpressions(PropertySet *properties) const {
const auto &it = lowest_cost_expressions_.find(properties);
return (it != lowest_cost_expressions_.end());
}

For this query, lowest_cost_expressions_ is empty so HasExpressions will always return false. For the original query lowest_cost_expressions_ is not empty and contains some result. For both queries required_props is empty, which is a little weird to me, but I don't really know what required_props is, so this might be fine.

I don't know enough about the optimizer to know exactly why this is happening, but my guess at a very high conceptual level is, generating a physical plan for the LIMIT happens after we enumerate the join orderings. Since we don't have a physical plan for the limit, we can't time out during the join enumeration.

It's very likely that #1229 is suffering from this.

Environment

OS: Ubuntu (LTS) 20.04

Compiler: GCC 7.0+

CMake Profile: Debug

Jenkins/CI: N/A

Steps to Reproduce

  1. Compile with the following args: -DCMAKE_BUILD_TYPE=Debug -DTERRIER_USE_ASAN=ON
  2. Run terrier with parallel execution turned off terrier -parallel_execution=false
  3. Run the following: https://github.com/oltpbenchmark/oltpbench/blob/master/src/com/oltpbenchmark/benchmarks/tpch/ddls/tpch-postgres-ddl.sql
  4. Run:
select
        s_acctbal
from
        part,
        partsupp,
        supplier,
        nation,
        region,
        customer,
        orders,
        lineitem
where
        p_partkey = ps_partkey
        and ps_suppkey = s_suppkey 
        and s_nationkey = n_nationkey
        and n_regionkey = r_regionkey
        and n_nationkey = c_nationkey
        and c_custkey = o_custkey
        and o_orderkey = l_orderkey
LIMIT 100;

Expected Behavior

noisepage=# select
noisepage-#         s_acctbal
noisepage-# from
noisepage-#         part,
noisepage-#         partsupp,
noisepage-#         supplier,
noisepage-#         nation,
noisepage-#         region,
noisepage-#         customer,
noisepage-#         orders,
noisepage-#         lineitem
noisepage-# where
noisepage-#         p_partkey = ps_partkey
noisepage-#         and ps_suppkey = s_suppkey 
noisepage-#         and s_nationkey = n_nationkey
noisepage-#         and n_regionkey = r_regionkey
noisepage-#         and n_nationkey = c_nationkey
noisepage-#         and c_custkey = o_custkey
noisepage-#         and o_orderkey = l_orderkey
noisepage-# limit 100;
 s_acctbal 
-----------
(0 rows)

Actual Behavior

noisepage=# select
noisepage-#         s_acctbal
noisepage-# from
noisepage-#         part,
noisepage-#         partsupp,
noisepage-#         supplier,
noisepage-#         nation,
noisepage-#         region,
noisepage-#         customer,
noisepage-#         orders,
noisepage-#         lineitem
noisepage-# where
noisepage-#         p_partkey = ps_partkey
noisepage-#         and ps_suppkey = s_suppkey 
noisepage-#         and s_nationkey = n_nationkey
noisepage-#         and n_regionkey = r_regionkey
noisepage-#         and n_nationkey = c_nationkey
noisepage-#         and c_custkey = o_custkey
noisepage-#         and o_orderkey = l_orderkey;
noisepage-# limit 100;

This keeps running for a long time without ever timing out. I've never stuck around long enough to confirm that it ever finishes but I'm assuming that it does eventually. As you remove more and more tables from the FROM clause then it will start to terminate and execute faster.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working (correctness). Mark issues with this.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions